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Abstract 

The  introduction  of  the  Global  Positioning  System  changed  the  way  the  United 
States  Air  Force  fights,  by  delivering  world-wide,  precision  navigation  capability  to 
even  the  smallest  platforms.  Unfortunately,  the  Global  Positioning  System  signal  is 
not  available  in  all  combat  environments  (e.g.,  under  tree  cover,  indoors,  or  under¬ 
ground).  Thus,  operations  in  these  environments  are  limited  to  non- precision  tactics. 
The  motivation  of  this  research  is  to  address  the  limitations  of  the  current  preci¬ 
sion  navigation  methods  by  fusing  imaging  and  inertial  systems,  which  is  inspired  by 
observing  the  navigation  capabilities  of  animals.  The  research  begins  by  rigorously 
describing  the  imaging  and  navigation  problem  and  developing  practical  models  of 
the  sensors,  then  presenting  a  transformation  technique  to  detect  features  within  an 
image.  Given  a  set  of  features,  a  rigorous,  statistical  feature  projection  technique  is 
developed  which  utilizes  inertial  measurements  to  predict  vectors  in  the  feature  space 
between  images.  This  coupling  of  the  imaging  and  inertial  sensors  at  a  deep  level 
is  then  used  to  aid  the  statistical  feature  matching  function.  The  feature  matches 
and  inertial  measurements  are  then  used  to  estimate  the  navigation  trajectory  on¬ 
line  using  an  extended  Kalman  filter.  After  accomplishing  a  proper  calibration,  the 
image-aided  inertial  navigation  algorithm  is  then  tested  using  a  combination  of  sim¬ 
ulation  and  ground  tests  using  both  tactical  and  consumer- grade  inertial  sensors. 
While  limitations  of  the  extended  Kalman  filter  are  identified,  the  experimental  re¬ 
sults  demonstrate  a  navigation  performance  improvement  of  at  least  two  orders  of 
magnitude  over  the  respective  inertial-only  solutions. 
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Fusion  of  Imaging  and  Inertial  Sensors  for  Navigation 


I.  Introduction 


his  dissertation  outlines  a  research  effort  focused  on  fusing  optical  and  inertial 


_I_  sensors  for  robust,  self-contained,  passive,  autonomous  navigation.  This  effort 
is  motivated  by  the  requirement  for  autonomous  navigation  in  environments  where 
external  navigation  reference  sources  (such  as  the  Global  Positioning  System)  are 
unavailable. 

Precision  navigation  is  the  cornerstone  to  the  modern  concepts  of  precision  at¬ 
tack  and  synergistic  combat  operations.  The  Global  Positioning  System  (GPS)  was 
fielded  in  the  1980’s  and  first  used  for  precision  navigation  and  targeting  in  combat 
during  the  first  Gulf  War  [70].  This  new  capability  of  precision  attack  and  maneuver 
allowed  the  coalition  forces  to  fight  and  win  the  fastest,  lowest  casualty,  most  devas- 
tatingly  destructive  one-sided  war  in  recorded  history  [28].  Precision  navigation  is  a 
force  enabler  which  combines  the  classic  tenets  of  mass  and  surprise  in  ways  which 
were  previously  impossible,  leading  to  the  development  of  the  doctrine  of  precision 
combat  [70]. 

The  precision  combat  revolution  has  led  to  new  tactics  and  development  of  a 
highly-focused  combat  force  which  exploits  a  minimum  of  weapons  to  produce  the 
maximum  effect  on  the  enemy.  In  addition,  precision  combat  creates  minimal  collat¬ 
eral  damage  and  fratricide  [26].  The  Air  Force’s  reliance  on  precision  combat  tactics, 
combined  with  a  lack  of  alternative  precision  navigation  technologies,  has  led  to  a 
dependence  on  GPS  [26,57].  Unfortunately,  GPS  requires  a  direct  line  of  sight  to 
the  satellite  constellation  and  is  vulnerable  to  attack  or  disruption  from  internal  or 
external  sources  [26].  Thus,  precision  combat  is  currently  impossible  in  areas  not 
serviced  by  GPS  (e.g.,  under  trees,  indoors,  underwater,  underground  or  in  jamming 
environments) . 
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The  motivation  of  this  work  is  to  address  the  limitations  of  the  current  precision 
navigation  methods  by  fusing  optical  and  inertial  systems.  This  concept  is  inspired 
by  observing  the  precision  navigation  capabilities  of  animals.  Research  has  indicated 
that  animals  utilize  visual  and  inertial  observations  to  navigate  with  precision  in  the 
air,  land,  and  sea  [29].  This  powerful  natural  demonstration  of  navigation  principles 
guides  this  work,  which  builds  on  previous  research  to  make  precision  navigation 
using  visual  and  inertial  measurements  possible  for  military  applications.  Not  only 
does  the  fusion  of  optical  and  inertial  sensors  reduce  or  eliminate  the  reliance  on 
GPS,  it  introduces  the  overwhelming  advantages  of  precision  combat  into  mission 
areas  which  were  previously  impossible  such  as  urban  or  underground  combat. 

A  general  overview  of  the  available  measurements  and  sensor  fusion  concepts  is 
presented  in  the  next  section. 

1.1  Problem  Definition 

Navigation  is  defined  as  the  determination  of  relative  position,  velocity,  and 
orientation  from  a  predefined  reference  with  respect  to  time.  A  trajectory  is  defined 
as  the  continuous  time  history  of  position,  velocity,  and  orientation  of  a  body. 

In  order  to  estimate  a  trajectory,  sensors  are  employed  to  make  measurements 
of  physically  observable  properties  of  the  environment  which  provide  information  re¬ 
garding  the  navigation  state.  For  this  research,  three  different  types  of  passive  sensors 
are  utilized:  accelerometers,  gyroscopes,  and  imaging  sensors. 

1.1.1  Accelerometers.  Accelerometers  are  designed  to  measure  specific  force , 
which  is  the  sum  of  linear  acceleration  and  gravitation.  The  measurements  from  real 
sensors  are  corrupted  by  errors  which  are  described  in  further  detail  in  Chapter  II, 
Section  2.5.1.  In  general  terms,  the  accelerometer  measurement  is  a  nonlinear  function 
of  the  linear  acceleration,  gravitation,  and  measurement  error  sources  such  as  thermal 
noise,  manufacturing  errors,  etc.  A  simple  accelerometer  measurement  model  is  shown 
in  Figure  1.1. 
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ACCELEROMETER  MODEL 

Figure  1.1:  General  accelerometer  block  diagram.  The  ac¬ 

celerometer  transforms  the  specific  force  into  a  measurement  in 
the  computation  domain.  The  measurement  can  be  corrupted 
by  a  number  of  physical  sources. 

1.1.2  Gyroscopes.  Gyroscopes  measure  angular  rotation  rate  with  respect 
to  an  inertial  reference.  The  measurements  are  corrupted  by  errors  which  are  dis¬ 
cussed  in  further  detail  in  Chapter  II,  Section  2.5.2.  In  the  most  general  terms,  the 
gyroscope  measurement  is  a  nonlinear  function  of  the  angular  rotation  rate,  attitude, 
and  measurement  errors.  A  simple  measurement  model  for  a  rate  gyroscope  is  shown 
in  Figure  1.2. 

1.1.3  Imaging  Sensors.  Optical  imagers  measure  the  intensity  pattern  of 
light  focused  on  an  array  of  light  sensors.  This  two-dimensional  intensity  pattern  is 
defined  as  an  image.  The  scene  is  the  set  of  characteristics  of  the  environment  which 
emit  or  reflect  photons  toward  the  imaging  sensor.  Imaging  sensors  are  corrupted  by 
errors  which  are  also  discussed  in  further  detail  in  Chapter  II,  Section  2.7.  Thus,  in 
general  terms,  the  image  is  a  nonlinear  function  of  the  position,  attitude,  scene,  and 
measurement  errors  as  shown  in  Figure  1.3. 
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Figure  1.2:  General  rate  gyroscope  block  diagram.  The  rate 

gyroscope  transforms  the  physical  angular  rotation  rate  into  a 
measurement  in  the  computation  domain.  The  measurement 
can  be  corrupted  by  a  number  of  physical  sources. 
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Figure  1.3:  General  imaging  system  block  diagram.  The  imag¬ 
ing  system  transforms  the  scene  irradiance  pattern  into  a  mea¬ 
surement  in  the  computation  domain  or  image.  The  measure¬ 
ment  can  be  corrupted  by  a  number  of  physical  sources. 


5 


MEASURED  ACCELERATION 


MEASURED  ANGULAR  RATE 


MEASURED  IMAGE 

- ► 


KNOWN  PHYSICAL 
PROPERTIES 
OF  ENVIRONMENT 

•GRAVITY 
•ANGULAR  RATE 
•PHYSICAL 
TOPOGRAPHY 
•VISUAL  TOPOGRAPHY 
•SENSOR  MODELS 
•etc. 


NAVIGATION 

ESTIMATION 

FUNCTION 


POSITION 


VELOCITY 


ATTITUDE 


Figure  1.4:  Trajectory  estimation  block  diagram.  The  estima¬ 
tor  is  a  nonlinear  function  which  combines  sensor  measurements 
with  known  properties  of  the  physical  environment  to  determine 
the  navigation  state. 

1.1.4  Trajectory  Estimation.  The  overall  objective  of  this  research  is  to 
derive  a  statistically  rigorous,  online  method  to  estimate  the  trajectory,  based  upon 
accelerometer,  gyroscope,  and  imaging  sensor  measurements.  In  general  terms,  the 
trajectory  estimator  is  a  nonlinear  function  which  combines  measurements  from  in¬ 
ertial  and  optical  sensors  with  the  known  physical  properties  of  the  environment  to 
produce  an  estimate  of  position,  velocity,  and  attitude.  This  model  is  shown  in  Fig¬ 
ure  1.4. 

More  specifically,  the  deep  integration  (i.e.,  fusion)  of  optical  and  inertial  sensors 
are  explored  and  natural  synergies  between  the  sensors  are  exploited  [43].  This  in¬ 
cludes  topics  such  as  stochastic  projections,  optical  and  inertial  calibration,  predictive 
feature  tracking  algorithms,  and  long-term  navigation.  Ultimately,  a  mathematically 
rigorous  solution  is  sought  which  provides  a  self-contained  precision  navigation  capa- 
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bility  to  the  Air  Force  and  reduces  the  reliance  on  GPS.  The  specific  contributions  of 
this  research  are  presented  in  the  next  section. 

1.2  Research  Contributions 

As  mentioned  in  the  previous  section,  the  objective  of  this  research  is  to  exploit 
the  natural  synergy  between  imaging  and  inertial  sensors  through  deep  integration 
techniques,  that  is,  combining  image  and  inertial  sensor  data  at  a  more  fundamental 
level  in  order  to  exploit  mutual  synergy  between  the  two  devices  [43].  Specifically, 
the  following  contributions  to  the  science  of  navigation  are  claimed  as  a  result  of  this 
work. 

The  primary  contribution  is  the  definition  of  the  feature  space  and  development 
of  the  mathematics  required  to  perform  statistically  rigorous  transformations  in  this 
feature  space  in  order  to  predict  the  features  at  a  future  time.  This  stochastic  pro¬ 
jection  leverages  the  collection  of  inertial  measurements  between  images  in  order  to 
improve  the  accuracy  and  speed  of  the  attendant  feature  correspondence  algorithm. 
The  feature  space  and  the  stochastic  projection  algorithm  are  developed  in  a  general 
and  specific  sense  in  Chapter  IV. 

This  method  of  stochastic  projections  is  a  key  enabler  of  the  second  contribution, 
which  is  the  development  of  a  deeply- integrated,  online,  image  and  inertial  navigation 
algorithm  [68,69].  The  deep  level  of  integration  is  a  unique  feature  of  this  work  and 
results  in  the  integration  of  the  inertial  and  imaging  sensors  at  a  more  fundamental 
level  than  has  been  done  up  to  this  point,  which  results  in  a  new  level  of  synergy. 
This  synergy  can  be  summarized  in  the  following  manner.  Given  a  series  of  raw 
measurements  from  inertial  and  imaging  sensors,  these  measurements  are  filtered  at 
the  raw  measurement  level,  using  an  optimal  state  estimate  derived  from  the  collection 
of  previous  measurements.  In  other  words,  not  only  does  the  imaging  sensor  help 
estimate  (and  correct)  errors  in  the  inertial  sensor,  the  inertial  sensor  helps  estimate 
(and  correct)  errors  in  the  imaging  sensor,  or,  more  specifically,  the  feature  space 
transformation  and  of  the  image.  This  technique  is  described  in  Chapter  IV. 
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In  addition  to  the  two  primary  contributions,  two  contributions  are  made  in 
the  calibration  of  imaging  and  inertial  sensors.  The  first  contribution  is  the  addition 
of  self-calibration  states  to  the  Kalman  filter  which  allows  the  imaging  and  inertial 
sensor  to  estimate  calibration  parameters  using  known  optical  references  (e.g.,  celestial 
bodies  and  surveyed  ground  points)  and  independent  measurements  of  the  trajectory, 
such  as  GPS.  This  technique  is  evaluated  using  simulation,  ground,  and  flight  data. 
Secondly,  the  achievable  limits  of  calibration  accuracy  are  examined  as  a  relation  to 
the  fundamental  limit  of  sensor  accuracy. 

The  final  important  contribution  of  this  work  is  the  implementation  of  the  deep- 
integration  theory  in  an  experimental  imaging  and  inertial  navigation  sensor.  The 
sensor,  combined  with  the  algorithms  developed  in  Chapter  IV,  demonstrated  the 
autonomous  operation  of  an  image  and  inertial  sensor  in  both  the  ground  and  flight 
environment.  In  addition,  the  performance  improvement  under  varying  navigation 
scenarios  is  demonstrated  using  this  novel  navigation  system. 

This  dissertation  is  organized  as  follows.  Chapter  II  provides  a  detailed  math¬ 
ematical  background  for  the  optical  and  inertial  integration  problem,  including  a 
description  of  the  notation,  primary  coordinate  frames  and  transformations,  inertial 
and  optical  sensor  models,  kinematic  models,  and  an  example  of  an  online  estimation 
algorithm.  Chapter  III  presents  methods  used  to  solve  similar  problems  of  inter¬ 
est  and  discusses  contributions  and  potential  improvements.  Chapter  IV  derives  and 
presents  the  candidate  image  and  inertial  sensor  fusion  algorithm.  Chapter  V  provides 
methodology  for  calibration  and  boresight  of  optical  and  inertial  sensors.  Chapter  VI 
describes  the  experimental  data  collection  system  and  presents  the  results.  Finally, 
the  conclusions  regarding  the  overall  theory  and  concepts  of  image  and  inertial  navi¬ 
gation  as  well  as  potential  areas  for  future  exploration  are  presented  in  Chapter  VII. 


II.  Mathematical  Background 


his  chapter  reviews  the  mathematical  background  required  to  adequately  pose 


_I_  the  problem  of  optical  and  inertial  sensor  fusion.  The  chapter  begins  by  defining 
a  standard  mathematical  notation  which  is  used  throughout  the  document.  Next,  the 
navigation  reference  frames  are  defined  and  a  mathematical  technique  for  transform¬ 
ing  coordinates  between  reference  frames  is  defined.  The  concept  of  a  mathematical 
error  model  is  presented  to  provide  background  for  the  presentation  of  a  rigorous 
mathematical  model  of  the  components  in  inertial  and  optical  sensors,  including  mod¬ 
els  describing  their  respective  nonlinear  properties.  Next,  a  short  discussion  of  the 
Kalman  filter  is  presented  as  a  commonly  implemented  online  recursive  estimation 
algorithm.  In  addition,  the  method  of  optimal  batch  estimation  for  both  linear  and 
nonlinear  models  is  presented  in  order  to  complement  the  Kalman  filter  discussion. 
Finally,  a  method  to  rigorously  define  perturbations  of  direction  cosine  matrices  is 
presented,  which  is  of  fundamental  importance  for  the  remainder  of  the  document. 

2. 1  Mathematical  Notation 

The  following  mathematical  notation  is  used: 

•  Scalars:  Scalars  are  represented  by  upper  or  lower  case  letters  in  italic  type, 
(e.g.  x  or  G ) 

•  Vectors:  Vectors  are  denoted  by  lower  case  letters  with  bold  font,  (e.g.  x  or 
4>.)  The  vector  x  is  composed  of  a  column  of  scalar  elements  Xi,  where  i  is  the 
element  number. 

•  Unit  Vectors:  Vectors  with  unit  length,  as  defined  by  the  two  norm,  are 
denoted  by  the  check  character,  as  in  ||x||  =  1. 

•  Matrices:  Matrices  are  denoted  by  upper  case  letters  in  bold  font.  For  example, 
the  matrix  X  is  composed  of  elements  where  i  is  the  row  index  and  j  is  the 
column  index. 
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•  Vector  Transpose:  The  vector  (or  matrix)  transpose  is  identified  by  a  super¬ 
script  Roman  T,  as  in  xT. 

•  Estimated  variables:  Variables  which  are  estimates  of  random  variables  are 
denoted  by  the  hat  character,  as  in  x. 

•  Computed  Variables:  Variables  which  are  corrupted  by  errors  are  denoted 
by  the  tilde  character,  as  in  x. 

•  Homogeneous  Coordinates:  Vectors  in  homogeneous  coordinate  form  are 
denoted  by  an  underline ,  as  in  x.  Homogeneous  coordinate  vectors  are  defined 
to  have  a  value  of  1  in  the  last  element. 

•  Direction  Cosine  Matrices:  Direction  cosine  matrices  from  frame  a  to  frame 
b  are  denoted  by  Cba. 

•  Reference  Frame:  If  a  vector  is  expressed  in  a  specific  reference  frame,  a 
superscript  letter  is  used  to  designate  the  reference  frame  (e.g.,  pa  is  a  vector 
in  the  a  frame.) 

•  Relative  Position  or  Motion:  In  cases  were  is  it  important  to  specify  relative 
motion,  combined  subscript  letters  are  used  to  designate  the  frames  (e.g.,  uy,/, 
represents  the  rotation  rate  vector  from  frame  a  to  frame  b.) 

2.2  Reference  Frames 

The  concept  of  navigation  reference  frames  is  an  important  fundamental  for 
expressing  the  position,  velocity,  and  orientation  of  a  body.  For  this  research,  the 
following  reference  frames  are  defined  based  on  those  presented  in  [7]  and  [64]: 

•  True  inertial  frame  (I- frame) 

•  Earth-fixed  inertial  frame  (i- frame) 

•  Earth-centered  Earth-fixed  frame  ( e-frame ) 

•  Navigation  frame  ( n’-frame ) 
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•  Earth- fixed  navigation  frame  (n- frame) 

•  Body  frame  {b- frame) 

•  Binocular  disparity  frame  ( c0-frame ) 

•  Camera  frame  (c- frame) 

Each  of  these  are  described  in  the  paragraphs  that  follow. 

The  true  inertial  frame  (/-frame)  is  a  theoretical  reference  frame  where  Newton’s 
laws  of  motion  apply.  The  frame  is  defined  by  an  non-accelerating,  non-rotating 
orthonormal  basis  in  3ft3.  Because  of  the  relative  nature  of  the  universe,  the  true 
inertial  frame  has  no  predefined  origin  or  orientation. 

The  Earth-centered  inertial  frame  (i-frame)  is  an  orthonormal  basis  in  3ft3,  with 
origin  at  the  center  of  mass  of  the  Earth.  The  x  and  y  axes  are  located  on  the  equa¬ 
torial  plane,  aligned  with  so-called  fixed  stars.  The  z-frame  is  a  non-rotating  frame 
but  does  accelerate  with  respect  to  the  true  inertial  frame,  because  the  Earth  revolves 
about  the  sun,  the  sun  revolves  about  our  galaxy,  etc.  For  terrestrial  navigation 
purposes,  however,  it  can  be  considered  an  inertial  reference. 

The  Earth-centered  Earth- fixed  frame  (e-frame)  is  an  orthonormal  basis  in  3ft3, 
with  origin  also  at  the  Earth’s  center  of  mass.  The  e-frame  is  rigidly  attached  to 
the  Earth,  with  the  x  axis  on  the  equatorial  plane  pointing  toward  the  Greenwich 
meridian,  the  z  axis  aligned  with  the  north  pole,  and  the  y  axis  on  the  equatorial  plane 
pointing  toward  90  degrees  east  longitude.  Because  the  e-frame  is  a  true  Cartesian 
reference  frame,  some  navigation  computations  are  simplified. 

The  vehicle- fixed  navigation  frame  (n’-frame)  is  an  orthonormal  basis  in  3ft3,  with 
origin  located  at  a  predefined  point  on  a  vehicle  (e.g.,  the  vehicle’s  center  of  gravity  or 
the  center  of  a  triad  of  inertial  sensors,  etc.)  The  vehicle-fixed  navigation  frame’s  x,y, 
and  z  axes  point  in  the  north,  east  and  down  (NED)  directions,  respectively.  Although 
the  concept  of  down  is  open  to  interpretation,  for  the  purposes  of  this  research  down 
is  defined  as  the  direction  a  plumb  line  would  point  due  to  gravity.  The  n'-frame 
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Figure  2.1:  Inertial,  Earth  and  vehicle-fixed  navigation  frame. 

The  inertial  and  Earth  frames  originate  at  the  Earth’s  center  of 
mass  while  the  vehicle-fixed  navigation  frame’s  origin  is  located 
at  a  fixed  location  on  a  vehicle. 

rotates  with  respect  to  the  e-frame  due  to  translational  motion  of  the  vehicle.  The  i, 
e  and  n’- frames  are  illustrated  in  Figure  2.1.  The  n-frame  is  illustrated  in  Figure  2.2. 

The  Earth- fixed  navigation  frame  (n- frame)  is  an  orthonormal  basis  in  T3 , 
with  origin  located  at  a  predefined  location  on  the  Earth,  typically  on  the  surface. 
The  Earth-fixed  navigation  frame’s  x,  y,  and  z  axes  point  in  the  north,  east,  and 
down  (NED)  directions  relative  to  the  origin,  respectively.  As  in  the  previous  case, 
down  is  defined  as  the  direction  of  the  gravity  vector.  In  contrast  to  the  navigation 
frame,  the  Earth-fixed  navigation  frame  remains  fixed  to  the  surface  of  the  Earth. 
While  this  frame  is  not  useful  for  very-long  distance  navigation,  it  can  simplify  the 
navigation  kinematic  equations  for  local  navigation  routes. 

The  body  frame  (6-frame)  is  an  orthonormal  basis  in  T3,  rigidly  attached  to  the 
vehicle  with  origin  co- located  with  the  navigation  frame.  The  x,  y,  and  z  axes  point 
out  the  nose,  right  wing,  and  bottom  of  an  aircraft,  respectively.  Strapdown  inertial 
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Figure  2.2:  Earth-fixed  navigation  frame.  The  Earth-fixed  navigation  frame  is  a 

Cartesian  reference  frame  which  is  perpendicular  to  the  gravity  vector  at  the  origin 
and  fixed  to  the  Earth. 
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Figure  2.3:  Aircraft  body  frame  illustration.  The  aircraft 

body  frame  originates  at  the  aircraft  center  of  gravity. 

sensors  are  fixed  to  the  6-frame,  although  they  may  not  be  located  at  the  origin  or 
aligned  with  the  axes.  The  6-frame  is  shown  in  Figure  2.3. 

The  camera  frame  (c-frame)  is  an  orthonormal  basis  in  T3,  rigidly  attached  to  a 
camera,  with  origin  at  the  camera’s  optical  center.  The  x  and  y  axes  point  up  and  to 
the  right,  respectively,  and  are  parallel  to  the  image  plane  of  the  camera.  The  z  axis 
points  out  of  the  camera  perpendicular  to  the  image  plane.  The  c-frame  is  shown  in 
Figure  2.4. 

The  binocular  disparity  frame  (co-frame)  is  an  orthonormal  basis  in  T3,  which  is 
rigidly  attached  to  the  lever  arm  located  between  cameras  in  a  binocular  configuration, 
with  origin  at  a  specified  point  on  the  lever  arm.  The  x,  y,  and  z  axes  point  forward, 
right,  and  down,  respectively.  The  c0-frame  is  shown  in  Figure  2.5. 
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Figure  2.4:  Camera  frame  illustration.  The  camera  reference 
frame  originates  at  the  optical  center  of  the  lens. 


cb  frame 


Figure  2.5:  Binocular  disparity  frame  illustration.  The  binoc¬ 
ular  disparity  frame  originates  at  the  midpoint  between  the  op¬ 
tical  center  of  the  two  camera  frames,  ca  and  c&. 
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2.3  Coordinate  Transformations 

Coordinate  transformations  describe  the  relationships  between  vectors  expressed 
in  various  reference  frames.  As  discussed  in  [64],  the  available  transformations  are 
classified  as  having  either  three  or  four-parameters.  Three  parameter  transformations 
suffer  from  singularities  at  certain  orientations,  whereas  four  parameter  transforma¬ 
tions  do  not.  For  the  purposes  of  this  research,  the  relevant  transformations  are  Euler 
angles  and  direction  cosine  matrices. 

Euler  angles  are  elements  of  a  three-component  vector  corresponding  to  a  specific 
sequence  of  single-axis  rotations  to  rotate  from  one  reference  frame  to  another.  A 
common  application  of  Euler  angles  is  expressing  the  transformation  from  the  body 
to  navigation  frame  of  an  aircraft  as  a  roll,  pitch  and  yaw  angle.  In  this  example,  the 
singularities  can  be  seen  at  pitch  angles  of  ±90  deg. 

Direction  cosine  matrices  (DCM)  are  four-parameter  transformations  expressed 
as  a  3  x  3  matrix.  The  matrix  consists  of  the  inner  product  (or  Cosines )  of  each  unit 
basis  vector  in  one  frame  with  each  unit  basis  vector  in  another  frame.  The  matrix 
form  of  the  direction  cosine  is  convenient  for  transforming  vectors,  as  in: 

p"  =  c‘p“  (2.1) 

where  Cba  is  defined  as  the  matrix  of  the  inner  products  of  the  unit  basis  vectors. 
Each  element  (C*  )  corresponds  to  the  scalar  projection  (or  cosine )  of  the  i-th  axis 

of  the  a  frame  onto  the  j- th  axis  of  the  b  frame. 

When  used  to  transform  right-hand  Cartesian  coordinates,  the  DCM  has  the 
following  properties: 

i.  Det( ;cy  =  |c‘|  =  i 
2-  cj  =  (cjy1  =  (cyT 

3.  CJ  =  CJCJ 
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The  time  derivative  of  the  direction  cosine  matrix  is  defined  as  [64] 


W 

a  ba 


(2-2) 


where  fl(/m  is  the  skew-symmetric  form  of  the  rotation  vector  from  frame  b  to  frame  a, 
expressed  in  the  a  frame  (u;“6).  The  skew-symmetric  operator  of  a  vector  u>  is  defined 


as 


0  C 0  z  OJ y 


(wx)=  u>z  0  —uix 


(2.3) 


0 


where  lux,  u>y,  and  uz  are  the  scalar  elements  of  u ;. 

The  above  development  allows  expression  of  the  navigation  state  (position,  ve¬ 
locity  and  orientation)  in  a  convenient  vector  notation.  In  addition,  vectors  can  be 
easily  transformed  between  reference  frames,  even  when  the  frame  is  rotating.  In  the 
next  section,  the  mathematics  of  perturbation  analysis  and  error  models  are  presented. 

2.4  Error  Models  and  Perturbation  Analysis 

In  many  cases,  properties  of  physical  systems  are  well-modeled  using  systems  of 
nonlinear  differential  equations  [37] .  Consider  the  homogeneous  nonlinear  differential 


equation 


x(f)  =  f  [x(t),u(f),t] ;  x(to)=x0 


(2.4) 


For  a  given  input  function,  Uo(t),  and  initial  condition,  x0,  there  exists  a  nominal 
solution  trajectory: 


x(f);  te[t0, 00) 


(2.5) 


Given  this  nominal  solution  trajectory,  the  effects  of  small  changes  (i.e.,  perturbations) 
in  the  initial  conditions  or  input  function  can  be  expressed  using  a  Taylor  series 
expansion  about  the  nominal  solution  trajectory.  These  perturbations  are  denoted 
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using  the  following  notation: 


x(t)  =  x(t)  +  hx(t); 

t  e  [t0,  oo) 

(2.6) 

u(t)  =  u(t)  +  Su(t)] 

t  e  [t0,  oo) 

(2.7) 

The  Taylor  series  approximation  about  the  nominal  is  then 


x(t)  =  f  [x(t),  u(t),  t]  +  F (t)hx(t)  +  B(t)hu(t)  +  e(x(t),  u (t),  t)  (2.8) 


where  e(x(t),  u (t),  t)  — >  0  as  5x  — >  0  and  (5u  — ►  0  and 
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(2.10) 


By  neglecting  the  £  term  and  subtracting  the  nominal  solution,  a  time-varying  linear 
differential  equation  remains.  This  equation  is  an  approximation  of  the  difference 
from  the  nominal  trajectory  and  is  called  the  linear  perturbation  equation  [37]: 


hx(t)  =  F  (t)hx(t)  +  B(t)hu  {t) 


(2.11) 


A  time- varying  linear  differential  equation  has  a  unique  solution,  which  becomes 
explicit  when  F (t)  and  B(t)5u(t)  are  piecewise  continuous  [37].  Under  these  condi¬ 
tions,  the  solution  to  the  time- varying  linear  differential  perturbation  equation  (2.11) 
is  given  by 

5x(t)  =  <I?(t,  to)^xo  +  f  3*(t,  T)B(r)hu(r)dr  (2-12) 

Jt0 

where  <]>(t,  to)  is  the  state  transition  matrix  from  time  to  to  time  t. 
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The  perturbation  solution’s  linear  form  allows  the  use  of  linear  estimation  meth¬ 
ods  (e.g.,  Kalman  filtering)  and  analysis  tools  which  is  discussed  in  more  detail  in 
Section  2.8. 

2. 5  Inertial  Navigation 

In  this  section,  basic  inertial  navigation  concepts  are  presented,  including  a  dis¬ 
cussion  of  the  strapdown  navigation  sensors,  the  measurements  available,  a  derivation 
of  the  system  dynamics,  and  the  resulting  differential  error  equations. 

2.5.1  Accelerometer  Sensor  Model.  Strapdown  inertial  navigation  systems 
consist  of  accelerometers  and  gyroscopes,  usually  mounted  in  orthogonal  triads.  Con¬ 
trary  to  their  name,  accelerometers  do  not  directly  measure  acceleration.  According 
to  Einstein’s  Theory  of  Special  Relativity  [9],  acceleration  due  to  motion  relative  to 
inertial  space  and  acceleration  due  to  gravitation  (or  mass  attraction)  are  indistin¬ 
guishable.  Thus,  accelerometers  produce  a  measurement  of  specific  force ,  which  is 
the  difference  between  the  acceleration  relative  to  inertial  space  and  the  gravitation 
vector,  defined  by 

f  =  p-g  (2.13) 

where  f  is  the  specific  force  vector,  p  is  the  position  vector,  and  g  is  the  gravitation 
vector. 

The  accelerometer  output  is  corrupted  by  a  number  of  error  sources,  which  are  a 
function  of  the  design  of  the  sensor  [64] .  In  general  terms,  the  dominant  accelerometer 
errors  are  as  follows: 

•  Bias:  A  bias  is  a  constant  or  slowly-varying  additive  error.  Some  bias  com¬ 
ponents  can  be  measured  and  corrected  through  factory  calibration  techniques, 
although  others  will  inevitably  remain.  Biases  can  change  after  power  cycles  or 
after  temperature  fluctuations. 
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•  Scale  Factor:  A  scale  factor  error  is  a  constant  or  slowly-varying  multiplicative 
error.  As  with  bias  errors,  some  scale  factor  effects  can  be  corrected  through 
calibration. 

•  Sensor  Misalignment:  Sensor  misalignment  errors  are  a  result  of  mechanical 
fabrication  and  installation  errors.  These  errors  result  in  a  difference  between 
the  accelerometers’  sensitive  axes  and  the  platform  reference. 

•  Vibration:  These  errors  cause  a  measurement  bias  as  a  function  of  certain 
vibrational  modes. 

•  Measurement  Noise:  Measurement  noise  is  observed  as  an  additive  error 
component  with  high-bandwidth  power  spectral  density.  This  noise  component 
is  the  theoretical  result  of  many  high-bandwidth  sources  such  as  electrical  noise, 
thermal  noise,  etc. 

•  Gravity  Model  Errors:  Because  the  accelerometer  measures  specific  force, 
the  acceleration  due  to  local  gravity  must  be  added  to  the  accelerometer  output 
to  produce  an  estimate  of  acceleration  in  the  inertial  frame.  Errors  in  this  local 
gravity  model  produce  additive  errors  in  acceleration.  Note  the  gravity  model 
errors  are  not  errors  in  the  accelerometer  measurement  itself,  however  they  arise 
when  converting  from  specific  force  to  acceleration. 

The  above  error  sources  can  be  modeled  as  a  quadratic  function.  Thus,  the 
measured  accelerometer  output  in  the  body  reference  frame,  f^,  is 

C  =  fb  +  A1fl b  +  At2  (f 6) T  f 6 A2  +  ab  +  r,ba  (2.14) 

where  the  fb  vector  is  the  true  specific  force  in  the  body  frame,  Ai  and  A2  are  the 
combined  accelerometer  first  and  second-order  error  component  matrices,  ab  is  the 
bias  vector,  and  rjba  is  the  additive  measurement  noise  vector. 

2.5.2  Gyroscope  Sensor  Model.  The  gyroscopes  used  in  strapdown  navi¬ 
gation  systems  measure  angular  rate  relative  to  inertial  space,  u>bib.  By  rearranging 
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Eqn.  (2.2),  the  measurement  can  be  expressed  using  the  inertial  to  body  direction 
cosine  matrix  as 

=  C*cj  (2,15) 

where  Qbib  is  the  skew-symmetric  form  of  the  angular  rate  vector,  u)\b.  The  skew 
symmetric  form  is  defined  in  Eqn.  2.3. 

Gyroscopic  sensors  are  subject  to  errors  which  corrupt  the  angular  rate  mea¬ 
surement  [64].  The  dominant  gyroscopic  errors  are  as  follows: 

•  Fixed  Bias  (^-independent):  This  bias  is  a  constant  or  slowly- varying  addi¬ 
tive  error  which  is  independent  of  acceleration. 

•  Acceleration-dependent  Bias:  This  is  a  bias  which  is  a  function  of  the  cur¬ 
rent  acceleration  applied  to  the  sensor.  This  bias  has  components  which  act 
both  in  the  gyroscope’s  sensitive  axis  as  well  as  orthogonal  to  the  sensitive  axis. 

•  Anisoelastic  Bias:  The  anisoelastic  bias  component  is  proportional  to  the 
product  of  acceleration  along  orthogonal  axes. 

•  Scale  Factor:  A  scale  factor  error  is  a  constant  or  slowly- varying  multiplicative 
error. 

•  Sensor  Misalignment:  Sensor  misalignment  errors  are  a  result  of  mechanical 
fabrication  and  installation  errors.  These  errors  result  in  a  difference  between 
the  gyroscopes’  sensitive  axes  and  the  platform  reference. 

•  Measurement  Noise:  Measurement  noise  is  observed  as  an  additive  error 
component  with  high-bandwidth  power  spectral  density.  This  noise  component 
is  the  theoretical  result  of  many  high-bandwidth  sources  such  as  electrical  noise, 
thermal  noise,  etc. 

Similarly  to  the  accelerometer  errors,  the  gyroscopic  errors  can  also  be  modeled 
as  a  quadratic  function.  Thus,  the  measured  angular  rate  output  in  the  body  reference 
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frame,  u )bibm,  is 


uibm  -  ^ib  +  BlWiL  +  B2  (WiL)  W»mB2 

+B3f6  +  b6  +  ^  (2.16) 

where  Bi  and  B2  are  the  combined  gyroscopic  first  and  second  order  error  components, 
B3  is  the  acceleration-dependent  bias  term,  bh  is  the  bias,  and  rfbh  is  the  additive 
measurement  noise. 

2.5.3  Inertial  Navigation  Dynamics.  In  this  section,  the  nonlinear  dynamic 
equations  governing  a  strapdown  inertial  navigation  system  are  developed  from  the 
basic  navigation  equations.  These  dynamic  equations  are  used  in  the  next  section  to 
derive  a  linear  perturbation  error  model. 

The  error  equations  are  developed  in  the  Earth-fixed  navigation  frame  defined 
in  Section  2.2.  The  Earth- fixed  navigation  frame  is  preferred  for  the  following  reasons: 

•  The  Earth-fixed  navigation  frame  is  a  Cartesian  space  in  T3  which  makes  the 
units  consistent  between  axes  and  is  conducive  to  good  scaling. 

•  The  Earth-fixed  navigation  frame  results  in  a  simplified  dynamic  model  with 
minimal  loss  of  accuracy  over  short  distances. 

•  Earth-fixed  navigation  frame  cartesian  navigation  state  parameters  can  be  easily 
transformed  to  geodetic  coordinates  (i.e.,  latitude,  longitude,  and  altitude)  for 
interpretation  and  analysis. 

The  navigation  states  of  interest  are  measures  of  position,  velocity,  and  orien¬ 
tation  with  respect  to  the  navigation  frame,  defined  as: 

•  The  pn  vector  is  the  three-dimensional  position  of  the  vehicle  relative  to  the  ori¬ 
gin  of  the  Earth-fixed  navigation  frame,  expressed  in  the  Earth-fixed  navigation 
frame. 
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•  The  vn  vector  is  the  three-dimensional  velocity  of  the  vehicle  relative  to  the 
Earth- fixed  navigation  frame,  expressed  in  the  Earth- fixed  navigation  frame. 

•  The  CJ'  matrix  is  the  DCM  which  transforms  vectors  from  the  body  frame  to 
the  Earth- fixed  navigation  frame. 

The  origin  location  of  the  Earth-fixed  navigation  frame  is  defined  relative  to  the  ECEF 
frame  as  Pq.  The  orientation  of  the  navigation  frame  is  defined  by  the  direction 
cosine  matrix,  C® .  Because  the  Earth-fixed  navigation  frame  is,  by  definition,  rigidly 
attached  to  the  Earth,  the  position  and  orientation  are  static,  e.g.  Pq  =  Q  and  C®  =  0. 

2.5.4  Attitude  Dynamics.  Applying  the  definition  of  the  derivative  of  a 
DCM  provided  in  Eqn.  (2.2)  to  yields 

Cj  =  C  A  (2.17) 

where  f lbnb  is  the  Earth-fixed  navigation  frame-to-body  frame  angular  rate,  expressed 
in  the  b  frame  and  in  skew-symmetric  form.  In  vector  form,  this  vector  is  the  difference 
between  the  body-to-inertial  angular  rate  (u>^6),  the  Earth’s  sidereal  angular  rate  (cj(e), 
and  the  angular  rate  of  the  navigation  frame  (u>”n),  all  expressed  in  the  b  frame: 

4  =  4  ~  C‘uC  -  C‘C>|e  (2.18) 

By  definition,  the  angular  rate  of  the  navigation  frame  relative  to  the  Earth,  (u>gn), 
is  zero,  which  simplifies  the  above  equation  to 

4*  =  4  -  C‘CXe  (2.19) 

Converting  Eqn.  (2.19)  to  skew-symmetric  form,  substituting  into  (2.17),  and 
simplifying  yields  the  time- varying  attitude  dynamic  equation 

c£  =  (2.20) 
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2.5.5  Position  and  Velocity  Dynamics.  The  position  and  velocity  dynamics 
are  calculated  by  first  expressing  the  position  in  the  i  frame: 


c; 


[Po 


C'P”] 


(2.21) 


where  Pq  is  the  location  of  the  origin  of  the  navigation  frame,  with  respect  to  the 
e-frame,  and  Qen  is  the  DCM  from  the  navigation  frame  to  the  e-frame.  Taking  the 
derivative  of  2.21  using  the  chain  rule  yields 

I>‘  =  C,C«p„  +  C,n,e  [pe  +  Cep»]  (2,22) 


An  additional  derivative  yields  the  acceleration  with  respect  to  the  inertial  frame: 


P'  =  C)C'p"  +  2C‘f!'eCJp”  +  Cl  (fi'J2  [pg  +  C'p’ 


(2.23) 


Recalling  the  definition  of  specific  force  from  Eqn.  (2.13)  and  substituting  into 
(2.23)  yields 


f  *  +  g*  =  cicenpn  +  2cin*ecenpn  +  ci  (O  If  [p§ 


C!dT 


(2.24) 


Solving  for  the  acceleration  with  respect  to  the  Earth-fixed  navigation  frame  (pn) 
results  in 

p“  =  f"  -  2Cjngcy>”  -  ci  (n; if  [Pg  +  C'P“]  +  g”  (2,25) 

The  velocity  with  respect  to  the  Earth-fixed  navigation  frame  is  defined  as 

pn  =  vn  (2.26) 
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Substituting  (2.26)  into  (2.24)  and  expressing  the  specific  force  in  the  b  frame  yields 
the  time- varying  position  and  velocity  dynamic  equation: 


=  Off*  -  2Cjn',c'pn 


-  c“  (nfey  [pS 


C'P"] 


g" 


(2.27) 


In  the  next  section,  these  dynamic  equations  are  used  to  derive  error  dynamics 
using  perturbation  methods. 

2. 5. 6  Development  of  Observation  Equations.  The  development  of  the  strap- 
down  INS  dynamics  is  completed  with  respect  to  the  Earth-fixed  navigation  frame.  It 
is  necessary  in  some  cases  to  determine  the  relationship  between  coordinates  expressed 
in  the  Earth-fixed  navigation  frame  and  a  global  reference  system,  which  motivates 
the  development  of  an  appropriate  coordinate  system  transformation.  In  this  case, 
the  well-known  World  Geodetic  System  of  1984  (WGS-84)  is  used.  The  parameters 
of  interest  are  defined  as 

•  L  is  the  geodetic  latitude  in  radians. 

•  A  is  the  longitude  in  radians. 

•  h  is  the  height  above  the  WGS-84  ellipsoid  in  meters. 

•  vn'  is  the  velocity  vector  with  respect  to  the  Earth,  expressed  in  the  navigation 
frame. 

•  0, 0  and  ip  are  the  roll,  pitch  and  yaw  angles  (in  radians)  to  rotate  from  the 
navigation  frame  to  the  body  frame. 

Because  the  WGS-84  system  origin  is  located  at  the  center  of  the  Earth  [7], 
the  Earth-fixed  navigation  frame  parameters  are  transformed  to  the  Earth-centered 
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Earth- fixed  reference  frame  using  the  following  transformations: 


pe 

=  p5  +  c*p" 

(2.28) 

ve 

=  C“v" 

(2.29) 

fie  nn 

— 

(2.30) 

The  observed  parameters  are  calculated  in  two  steps.  Longitude  is  calculated  di¬ 
rectly  using  the  four-quadrant  arctangent  function  and  elements  of  the  ECEF  position 
vector,  pe,  as 

A  =  arctan  (~j\  (2.31) 

\Px  / 

The  geodetic  latitude  and  height  above  ellipsoid  are  calculated  by  solving  the  following 
nonlinear  inverse  relationships,  assuming  a  WGS-84  ellipsoidal  Earth  model  [7] : 

(Pi?  +  (pey)2  =  [N  +  h]2  cos2  L  (2.32) 

pez  =  [IV (1  —  e2)  +  h]  sinL  (2.33) 


where  N  is  the  geodetic  radius  of  meridian  in  meters,  defined  by 


N  =  .  (2.34) 

—  e2  sin2  L 

In  the  above  equation,  a  is  the  WGS-84  semi-major  axis,  and  e  is  the  WGS-84  ellipsoid 
eccentricity  [7].  The  navigation  frame  velocities  are  computed  using  the  earth  frame 
to  navigation  frame  (North-East-Down)  transformation  (C”*)  [64]: 


—  sin  L  cos  A  —  sin  L  sin  A  cos  L 

—  sin  A  cos  A  0 

—  cos  L  cos  A  —  cos  L  sin  A  —  sin  L 


(2.35) 
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The  navigation  frame  velocity  (v"/)  is  simply  the  matrix  multiplication  of  the 
earth  frame  to  navigation  frame  DCM  and  the  earth  frame  velocity  vector: 


Cne  ve 


(2.36) 


To  obtain  the  Euler  angles,  the  direction  cosine  matrix  from  the  navigation 
frame  to  the  body  frame  (C^/)  is  calculated  as 


ct  =  C‘(C”')5 


(2.37) 


The  Cbn,  matrix  can  be  expressed  equivalently  composed  of  Euler  angles  [64] : 


cos  xp  cos  9  sin  xp  cos  9  —sinO 

—  sin  xp  cos  <p  +  cos  xp  sin  9  sin  <p  cos  xp  cos  <p  +  sin  ip  sin  9  sin  <p  cos  9  sin  <p 
sin  xp  sin  <p  +  cos  ip  sin  9  cos  <p  —  cos  ip  sin  <p  +  sin  ip  sin  6  cos  <p  cos  6  cos  <p 


(2.38) 

where  <p,  9,  and  xp  are  the  roll,  pitch,  and  yaw  Euler  angles,  respectively.  Hence,  the 
Euler  angles  can  be  computed  using  terms  from  C bn,\ 


(p 

0 

pj 


arctan 


^,3 


arcsin  (C^,13) 


arctan 


(2.39) 

(2.40) 

(2.41) 


Note  that  the  use  of  Euler  angles  causes  a  singularity  when  9  =  ±|,  as  expected. 


2.6  Inertial  Navigation  Error  Model 

As  shown  in  the  previous  section,  the  inertial  navigation  dynamics  are  expressed 
as  nonlinear  differential  equations.  Analyzing  the  differences  between  the  output  of 
an  inertial  navigation  system  and  the  true  trajectory  motivates  the  development  of 
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an  inertial  navigation  error  model.  This  perturbation  model  can  be  used  to  estimate 
the  vehicle’s  true  trajectory  by  fusing  optical  and  inertial  sensors. 

2.6.1  Inertial  Sensor  Error  Model.  For  this  derivation,  the  accelerometer 
and  gyroscopic  errors  are  modeled  as  a  bias  plus  a  random  noise.  The  biases  are 
modeled  as  first-order  Gauss-Markov  processes  [37].  The  random  measurement  noise 
is  modeled  as  an  additive  white  Gaussian  noise  process.  The  resulting  accelerometer 
measurement  model  becomes 


t  =  f"  +  a6  +  w‘  (2.42) 

where  a6  is  the  accelerometer  bias  and  is  an  additive  white  Gaussian  noise  process. 
Similarly,  the  resulting  gyroscope  measurement  model  is 

Ibm  =  u ib  +  bb  +  wbb  (2.43) 

where  b6  is  the  gyroscope  measurement  bias  and  w{]  is  an  additive  white  Gaussian 
noise  process. 

The  bias  is  a  first-order  Gauss-Markov  process  which  is  expressed  by  the  follow¬ 
ing  differential  equations 

a‘  =  -fak  +  w^  (2.44) 

1  a 

b*  =  (2.45) 

The  time  constants  are  represented  by  ra  and  r&  for  the  accelerometer  and  gyroscopic 
biases,  respectively.  The  processes  are  driven  by  the  white  noise  terms,  w £  and 


2.6.2  Attitude  Error  Development.  The  development  of  the  attitude  error 
dynamics  begins  with  a  definition  of  the  error  states.  The  attitude  error  vector  (ip) 
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is  modeled  as  a  vector  of  small  angles  about  the  north,  east  and  down  axes  of  the 
navigation  frame  and  xpd,  respectively)  [71].  The  attitude  error  vector  is 

defined  as 


f/Vi. 

Ipe 

fa 


(2.46) 


Because  the  angular  errors  are  assumed  to  be  small,  the  computed  body-to- 
Earth-hxed  navigation  frame  DCM  can  be  expressed  as  [64] 


C?«[I-«.x)]C? 


(2.47) 


where  (?/>x)  is  the  skew-symmetric  form  of  t/>.  Taking  the  derivative  of  (2.47)  with 
respect  to  time  yields 


e?  =  -(v.x)c?  +  [I-«>x)]C?  (2.48) 

Substituting  (2.17)  into  (2.48)  yields 

CA  =  -  (V>  x)  c;  +  [I  -  x )]  CA  (2.49) 

Solving  for  (t/>x)  results  in 


(v.  X  )  =  [I  -  (V>  x )]  CACn  -  cAcbn  (2.50) 

The  calculated  body-to-earth-fixed  navigation  frame  rotation  rate  vector,  ushnh,  is  de- 
hned  as 


U)b,=L -CbCnue 

no  ibm  n  e  ie 


(2.51) 


where  0Jbib  is  the  body-to-inertial  angular  rate  vector  measured  by  the  gyroscopes. 
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Substituting  (2.47),  (2.51),  and  (2.43)  into  (2.50)  results  in 


(v-x)  =  [i  —  (i/’x)]  cjnyc4 

-  [I  -  (-0X)]  C?  [*&,  +  (b‘x)  -  [(C‘  [V-x]  CX«)  x] 

+  Wx)]Ct  (2.52) 

Simplifying  by  eliminating  common  terms  yields 

(*x)  =  -[I-»x)]C?[(b*x)-[(C*[V>x]CXe)x] 

+  (wJx)]C‘  (2.53) 

Eliminating  second-order  terms  and  collapsing  the  skew-symmetric  form  yields  the 

linearized  angular  error  differential  equation 

*  =  -  l(C>y  x]  t  -  CJb4  -  C?wJ  (2,54) 

2.6.3  Velocity  Error  Development.  The  development  of  the  velocity  error 
dynamics  proceeds  similarly  to  the  attitude  errors.  The  position  and  velocity  errors 
(hpn  and  5vn,  respectively)  are  defined  as 

Spn  =  p  n_pn  (2.55) 

5vn  =  vn-vn  (2.56) 

These  definitions  are  substituted  into  the  dynamics  model  to  derive  the  dynamics  of 
the  position  and  velocity  errors. 

Recall  the  acceleration  dynamics  equation  (2.27)  derived  in  Section  2.5.5: 

v"  =  cjf4  -  2cjf!'ec;p"  -  c :  (ny2  [pg  +  c>"]  +  g”  (2.57) 
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The  true  acceleration  with  respect  to  the  Earth- fixed  navigation  frame,  vn,  consists  of 
a  specific  force,  Coriolis  effects,  centripetal  acceleration,  and  a  gravitation  term.  The 
centripetal  acceleration  and  gravitation  terms  are  typically  combined  into  a  single 
gravity  vector,  ge(pe).  The  gravity  vector  is  formally  defined  as  the  gradient  of  the 
gravity  potential,  tT(pe),  shown  in  [7]  as  the  sum  of  the  gravitation  and  centripetal 
acceleration  as  a  function  of  Cartesian  position  relative  to  the  center  of  the  Earth: 

GM  1  T  T 

W'(p')  =  FTil  +  ^p'  «:,p‘  +  H.O.T.  (2.58) 

IIP  II  2 

where  GM  is  the  Earth’s  gravitational  constant.  Substituting  the  gravity  function 
into  Equation  (2.57)  yields  the  acceleration  dynamics  equation: 

Vn  =  Cnf6  _  2cneneiecenvn  +  c:ge  (pH  +  cenPn)  (2.59) 


The  calculated  velocity  vector  differential  equation  is  corrupted  by  both  the 
accelerometer  measurement  errors  as  well  as  the  attitude  errors  developed  in  Sec¬ 
tion  2.5.5: 


2C^C*  v"  +  C^g6 


(Po 


Ce-n) 


(2.60) 


Substituting  the  attitude,  velocity,  position,  and  accelerometer  measurement  error 
equations  into  (2.60)  yields 


v  =  [I-(t/.x)]C^(ft  +  ai  +  wJ)-2C:f2'eC'(v”  +  5v")  (2.61) 

+C“g'  (p§  +  CJp“  +  C'5p") 


The  acceleration  error  vector,  Svn,  is 

Sirn  =  <rn  -  vn 


(2.62) 
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Substituting  (2.57)  and  (2.61)  into  (2.62)  and  eliminating  second-order  terms  yields 


8wn  =  Cnbab  -  (t/>  x )  C£f&  -  2Cne£llCeJvn  +  CneGCeJpn  +  Cnbwba  (2.63) 

where  G  is  the  gradient  of  the  gravity  vector,  ge(po  +  C®  p").  This  gradient  is  calcu¬ 
lated  as  [7] 

G  =  V9|p„  (2.64) 

G  =  jPp  [3p'(p‘)T  -  I]  -  (Sit)2  (2.65) 

where  pe  is  the  ECEF  position  unit  vector.  The  second  term  of  (2.63)  can  be  rear¬ 
ranged  to  facilitate  state-space  form  as  follows 

(ipx)C^fb  =  ^xfn  (2.66) 

=  -(fnx)V>  (2.67) 

Substituting  into  (2.63)  yields  the  linear  stochastic  velocity  error  model: 

5vn  =  CneGCeJpn  -  2CneCleieCen8vn  +  (fn x )  +  Cbab  +  Cnbwba  (2.68) 

Finally,  the  position  error  is  simply  the  kinematic  relationship  between  position 
and  velocity: 

<5pn  =  8wn  (2.69) 

2.6.4  State-space  Model.  The  error  dynamics  can  now  be  expressed  using 
the  standard  linear,  stochastic,  state-space  model  driven  by  white  noise  [37]: 

<5x(t)  =  F(f)x(t)  +  G(t)w(t)  (2.70) 

Note  these  error  dynamics  correspond  to  a  specific  accelerometer  and  gyroscope  error 
model.  In  this  example,  the  errors  are  modeled  by  a  time-varying  bias  plus  ran- 
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dom  noise.  Adding  or  reducing  the  complexity  of  the  error  model  would  change  the 
resulting  error  dynamics. 

The  navigation  error  state  vector,  Sx,  consists  of  position  (Vip"),  velocity  (hv"j, 
attitude  (-0),  accelerometer  bias  (ha6),  and  gyroscope  bias  errors  (hb6)  and  is  ex¬ 
pressed  as  a  vector  of  fifteen  elements: 


hpn 

hvn 


hx  = 


*l> 


8ab 


hb6 


15x1 


(2.71) 


The  driving  noise  vector,  w,  consists  of  the  noise  terms  associated  with  the 
accelerometer  measurement  (w^),  gyroscope  measurement  (w£),  accelerometer  bias 
and  gyroscope  bias  (w£  ),  and  is  expressed  as 
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(2.72) 
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The  overall  error  dynamics  in  augmented  state-space  form  are 
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15x12 


Note  that  the  error  dynamics  are  a  time-varying  function  of  the  trajectory  flown. 


2.1  Digital  Imaging 

In  this  section,  concepts  relative  to  digital  imaging  and  object  tracking  are 
presented,  including  a  discussion  of  the  optical  sensor  model,  projection  theory,  and 
fixed-object  tracking. 

2. 7. 1  Optical  Sensor  Model.  An  optical  sensor  is  a  device  designed  to  mea¬ 
sure  the  intensity  of  optical  energy  (light)  entering  the  sensor  through  an  aperture. 
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Imaging  sensors  consist  of  an  array  of  light-sensitive  detectors  which  create  a  multidi¬ 
mensional  light  intensity  measurement.  In  this  section,  the  basic  physical  properties 
of  an  optical  sensor  are  presented,  and  a  model  representing  an  optical  sensor  is  given. 

For  the  purposes  of  this  discussion,  the  world  is  defined  as  a  collection  of  ah 
real  objects.  Some  objects  are  sources  of  radiometric  illumination  or  radiance.  These 
light  sources  illuminate  the  world  and  interact  with  the  other  physical  objects  through 
various  types  of  reflection.  The  amount  of  light  propagating  along  a  certain  direction 
is  defined  as  the  irradiance  [36] .  The  physical  irradiance  pattern  entering  the  aperture 
of  the  optical  sensor  is  defined  as  the  scene  and  is  represented  by  a  time- varying  array 
of  nonnegative  rational  numbers,  0(x,  y,  t ),  corresponding  to  the  photon  arrival  rate. 

A  digital  optical  imaging  sensor  consists  of  an  aperture,  lens,  detector  array, 
and  sampling  array.  A  simple  imaging  system  model  is  shown  in  Figure  2.6.  The  lens 
focuses  the  scene  on  the  detector  array.  The  light  pattern  focused  on  the  detector 
array  is  defined  as  the  image  and  represented  by,  I (x,y,t).  In  statistical  terms,  the 
image  is  the  mean  photon  arrival  rate.  The  detector  array  converts  the  light  energy 
into  a  voltage  or  a  charge  which  is  converted  to  a  digital  value  by  the  sampling  array. 
The  sampling  array  is  assumed  to  be  a  square  grid,  although  other  patterns  can  be 
designed  (e.g.,  honeycomb)  [24], 

The  lens  is  a  natural  low-pass  filter  in  the  spatial  domain,  with  a  cutoff  fre¬ 
quency  (/c)  defined  by  the  aperture  diameter  (D).  wavelength  of  light  source  (A),  and 
focal  length  (/)  [15]: 


Thus,  a  point  source  of  light  would  appear  slightly  blurred  on  the  image  plane.  Assum¬ 
ing  spatial  invariance,  this  blurring  due  to  the  lens  is  represented  by  the  point  spread 
function ,  H (x,y),  where  x  and  y  are  the  spatial  differences  in  the  x  and  y  directions, 
respectively.  The  image  can  now  be  expressed  mathematically  as  the  convolution  of 
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ILLUMINATION 


Figure  2.6:  Imaging  system  model.  The  imaging  system  trans¬ 
forms  the  scene  into  a  digital  image.  The  major  components  of 
the  camera  are  the  optics,  light  detector,  amplifier,  and  analog 
to  digital  converter. 

the  scene  and  point  spread  function  [20]: 

I (x,y,t)=f  f  0(€,p,t)H(x-£,y- p)dpd£  (2.75) 

dfex  Jpe y 

where  X  and  Y  are  the  dimensions  of  the  image  array  in  the  x  and  y  directions, 
respectively. 

The  light  energy  in  the  image  is  integrated  over  a  period  defined  as  the  dwell 
time  (At).  The  integration  output  is  defined  as 

I  d(x,y,ti)=f  I(x,y,t)dt  (2.76) 

Jti-At 

At  the  end  of  an  integration  period,  the  accumulated  charge  of  the  sensor  is  pro¬ 
portional  to  the  number  of  photons  received,  which  is  referred  to  as  the  quantum 
efficiency  (p).  The  resulting  accumulated  charge  is  represented  by 

E(x,  y,  U )  =  pld(x,  y,  U)  (2.77) 
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Due  to  the  quantum  nature  of  light,  this  accumulated  charge  is  corrupted  by  a  random 
photon  counting  noise,  sometimes  referred  to  as  shot  noise.  The  accumulated  charge 
is  converted  to  a  voltage  by  an  amplifier  with  gain  g ,  bias  b,  and  additive  readout 
noise  parameter  wa: 

=  gE(x,y,U)  +  b  +  wa  (2.78) 

This  voltage  is  then  converted  to  a  digital  number  in  the  quantizer: 

Ba(x,y,ti)  =q[T>(x,y,ti)\  (2.79) 


which  can  be  stored  in  a  computer’s  memory. 

As  mentioned  above,  the  image  model  includes  three  sources  of  noise:  pho¬ 
ton  counting  noise,  readout  noise,  and  quantization  noise.  Photon  counting  noise  is 
modeled  as  a  Poisson  process  [48].  The  probability  distribution  function  is 

P  [E(z,  y,  U)  =  k}=  ,  ke  Z*  (2.80) 


Readout  noise  includes  the  effects  of  amplifier  thermal  noise  on  the  readout  voltages 
and  is  modeled  as  independent,  identically  distributed  zero-mean  Gaussian  distribu¬ 
tions  with  covariance  matrix  P aa.  Quantization  noise  results  from  the  errors  induced 
by  converting  a  nonnegative  real  voltage  from  the  amplifier  to  a  digital  number  with 
fixed  precision.  Assuming  a  linear  quantizer  with  a  precision  of  n  bits  and  range  of  V 
volts,  the  quantization  noise  can  be  modeled  as  a  zero- mean  uniform  distribution  [48] 
with  variance 
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(2.81) 


In  the  next  section,  a  mathematical  model  describing  the  relationship  between 
point  locations  in  the  world  and  image  is  derived. 


2.1.2  Projection  Theory.  The  camera  optical  properties  define  the  rela¬ 
tionship  between  the  scene  and  the  projected  image.  Recalling  the  simple  camera 
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Figure  2.7:  Thin  lens  camera  model.  The  thin  lens  model 

directs  parallel  light  rays  toward  the  focus,  resulting  in  an  image. 
Figure  is  not  to  scale. 


model  (Figure  2.6),  the  lens  focuses  the  incoming  irradiance  pattern  (i.e.,  scene)  onto 
the  image  plane.  For  a  theoretical  thin  lens,  the  projection  is  a  function  of  the  fo¬ 
cal  length  of  the  lens  and  the  distance  from  the  lens,  as  shown  in  Figure  2.7.  This 
relationship  is  expressed  by  the  fundamental  equation  of  the  thin  lens  [36] : 


1 

Z 


1 

7 


(2.82) 


where  Z  is  the  distance  from  the  object  to  the  lens,  z  is  the  distance  from  the  lens  to 
the  image  plane,  and  /  is  the  focal  length. 

As  the  aperture  of  the  thin  lens  decreases  to  zero,  the  system  can  be  modeled 
as  a  pinhole  camera  (see  Figure  2.8).  In  this  model,  all  incoming  light  must  pass 
through  the  optical  center  and  is  projected  on  an  image  plane  located  at  a  distance 
/  from  the  lens.  The  resulting  image  is  an  inverted  projection  of  the  scene. 

This  model  can  be  further  simplified  by  placing  a  virtual  image  plane  in  front 
of  the  optical  center,  as  shown  in  Figure  2.9.  Given  a  point  source  at  location  sc  the 
resulting  location  of  the  point  source  on  the  image  plane,  relative  to  the  optical  center 
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IMAGE 

Figure  2.8:  Pinhole  camera  model.  The  pinhole  camera  is  a 

theoretical  camera  model  where  a  thin  lens  aperture  approaches 
zero.  The  projected  image  is  inverted  on  the  image  plane. 


of  the  camera,  is  given  by 

s proj  _ 

where  scz  is  the  distance  of  the  point  source  from  the  optical  center  of  the 
the  zc  direction. 


(2.83) 
camera  in 


In  order  to  interpret  the  calculated  projection  in  a  digital  image,  the  physical 
image  plane  coordinates  must  be  converted  to  a  coordinate  system  based  on  pixel 
location.  The  following  development  defines  the  pixel  coordinate  system  and  derives 
the  transformation  from  the  physical  image  plane  to  pixel  location.  The  image  plane 
consists  of  an  (M  x  N )  grid  of  rectangular  pixels  with  height  H  and  width  W,  shown 
in  Figure  2.10.  The  origin  of  the  projection  frame  is  located  at  the  physical  center 
of  the  array.  The  origin  of  the  pixel  coordinate  system  is  located  beyond  the  upper 
left  corner  of  the  array,  such  that  the  center  of  the  upper  left  pixel  corresponds  to 
the  (1,1)  pixel  coordinate.  This  definition  of  pixel  coordinates  corresponds  to  the 
elemental  matrix  locations  when  the  image  is  stored  in  a  computer. 

The  transformation  from  the  projection  coordinates  to  pixel  coordinates  is  given 
by 
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Figure  2.9:  Camera  projection  model.  The  pinhole  camera 

model  is  modified  by  placing  a  virtual  image  plane  one  focal 
length  in  front  of  the  optical  center.  As  a  result,  this  model 
eliminates  the  image  inversion  present  in  the  standard  pinhole 
camera  model. 


Combining  Equations  (2.83)  and  (2.84)  yields  the  following  transformation  from  cam¬ 
era  frame  to  homogeneous  pixel  coordinates 
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Figure  2.10:  Camera  image  array.  The  camera  imager  consists 
of  an  (M  x  N )  array  of  pixels.  The  physical  height  and  width 
of  the  array  are  represented  by  H  and  W,  respectively. 
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where  Tpix  is  the  homogeneous  transformation  matrix  from  camera  frame  to  pixel 
frame.  The  inverse  transformation  is  calculated  as 


rr\c  _  /rr\pix\—  1 

pix  V  c  ) 


(2.87) 
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A  transformation  from  a  target  location  in  Earth-fixed  navigation  frame  coor¬ 
dinates  to  pixel  coordinates  can  now  be  derived  based  on  the  navigation  state.  The 
geometry  is  shown  in  Figure  2.11.  The  line  of  sight  vector,  s,  is  the  vector  difference 
between  the  target  location  t  and  the  camera  position,  which  are  both  available  in 
Earth- fixed  navigation  frame  coordinates: 


(2.89) 


The  resultant  vector  can  be  transformed  to  the  camera  reference  frame  using  the 
Earth- fixed  navigation-to-body  and  body-to-camera  frame  direction  cosine  matrices: 


(2.90) 


sc  =  CchCbs‘ 


Finally,  the  pixel  location  is  calculated  using  Eqn.  (2.86). 

2.7.3  Nonlinear  Optical  Distortion.  As  discussed  in  [36],  the  optical  prop¬ 
erties  of  a  lens  can  be  significantly  different  than  the  pinhole  model.  This  distortion 
is  primarily  radial  in  nature  and  can  be  identified  when  straight  lines  appear  curved 
in  the  image.  An  optical  distortion  model  is  desired  which  transforms  locations  in  a 
distorted  image  to  an  equivalent  pixel  location  which  agrees  with  the  pinhole  camera 
model. 
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Figure  2.11:  Target  to  image  transformation  geometry.  The 

relationship  between  the  camera  position,  (p),  and  target  loca¬ 
tion,  (t),  can  be  expressed  in  pixel  coordinates  using  transfor¬ 
mations  based  on  the  navigation  state  and  camera  parameters. 
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In  [36]  and  [11],  the  following  radial  distortion  correction  is  proposed: 


Zu  =  Va(r)(zd  -  c0)  +  Co  (2.91) 

where  zd  are  the  distorted  pixel  locations,  z"  are  the  undistorted  pixel  locations,  c0  is 
the  center  of  optical  distortion  (in  pixels),  and  Va(r)  is  a  polynomial  expansion  in  r. 
This  is  defined  as 

2 

Va{r)  =  l  +  J2a2ir2i  (2.92) 

2=1 

where  r  is  the  radial  distance  from  the  center  of  optical  distortion 

r  =  \\zd  —  Co||  (2.93) 

and  an  are  the  coefficients  of  the  polynomial. 

Concepts  related  to  tracking  objects  in  the  scene  are  presented  in  the  next 
section. 

2.7.4  Fixed  Object  Tracking.  As  shown  in  the  previous  section,  an  image 
is  a  transformation  based  on  the  illumination  of  objects  in  the  field  of  view.  It  is 
well  known  that  a  sequence  of  images  of  a  particular  scene  can  provide  information 
regarding  the  camera  location,  camera  orientation,  and  the  location  of  objects  in  the 
field  of  view  [36,47,62,63].  This  is  accomplished  by  identifying  and  matching  common 
objects  in  multiple  images,  which  is  known  as  feature  tracking. 

Feature  tracking  methods  typically  begin  by  analyzing  an  image  to  determine 
the  location  of  features — areas  which  are  identifiable  by  some  measure  and  which  have 
the  highest  potential  for  locating  the  matching  feature  in  subsequent  images.  The 
classic  feature  detection  algorithms  are  the  Lucas-Kanade  feature  tracker  [35]  and  the 
similar  Harris  corner  detector  [18].  These  algorithms  attempt  to  find  a  maximum 
amount  of  detail  information  by  calculating  the  eigenvalues  of  the  following  matrix 
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over  a  window,  W : 


G=  Ei,t,ew  I'c(a;>  v)  Ei,sew  y)iy(x,  y) 

^x,y£W^x(x,y)^y(Xiy)  ,y£  W  1 2y(x,y)  \ 

where  lx  and  lv  are  the  gradients  of  an  image  I  in  the  x  and  y  directions,  respectively. 
A  zero  eigenvalue  corresponds  to  zero  detail  in  the  direction  of  the  eigenvector,  which 
results  from  a  flat  image.  The  eigenvalues  increase  with  increasing  detail,  indicating 
a  potentially  distinct  feature.  Features  are  declared  by  thresholding  the  eigenvalues 
and  choosing  the  strongest  candidates.  Harris  refines  this  method  by  defining  the 
following  metric: 

C(G)  =  det{ G)  +  k  [trace2(G)]  (2.95) 

which  is  equivalent  to 

C(G)  =  (1  +  2k)a2a2  +  k{a\  +  a2)  (2.96) 

where  o\  and  a2  are  the  eigenvalues  of  G,  and  A:  is  a  tuning  parameter.  Smaller 
values  of  k  favor  detail  in  both  directions,  while  larger  values  of  k  are  more  accepting 
of  detail  in  only  one  direction  [18]. 

More  robust  feature  selection  algorithms  have  been  proposed.  An  example  is  the 
scale-invariant  feature  tracker  (SIFT)  method  developed  by  Lowe  [33].  These  SIFT 
features  are  invariant  over  scale  and  rotational  changes,  thus  providing  a  more  robust 
solution  for  arbitrary  motion.  The  SIFT  algorithm  is  described  in  further  detail  in 
Chapter  IV.  In  addition,  Bhanu  proposes  the  use  of  genetic  algorithms  for  feature 
detection  [3].  In  any  case,  identifying  robust  features  is  a  rich  problem  and  remains 
an  area  of  active  research. 

After  a  feature  corresponding  to  a  physical  object  has  been  identified  in  one 
frame,  the  same  feature  must  be  located  in  subsequent  frames.  The  most  common 
approach  defines  an  error  cost  function  designed  to  represent  the  “difference”  between 
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the  features.  A  minimization  then  determines  the  pixel  location  which  has  the  lowest 
error  cost,  resulting  in  the  best-match.  For  example,  the  classic  Lucas-Kanade  tracker 
seeks  to  minimize  the  sum  of  the  squared  differences  of  the  image  intensity  over  a 
specified  region  [35].  Given  a  sequence  of  images  (I(fi), I(fi), . . . ,  I(tn)},  assume  a 
feature  is  defined  as  a  region  Wi  on  Iff  i )  as 

F1  =  I(x,y,t1),  x,yeW1  (2.97) 

The  sum  of  squared  differences  for  the  feature  Fi  and  image  l(tn)  is 

Ci n(oi,P)=  ^2  ^2  ~  P,tn)  -  I(x,?/,ti)]2  (2.98) 

xEWi  ye Wi 

The  cost  function,  C,  is  minimized  over  the  shift  vector,  (a,  (3),  to  determine  the  best 
feature  match. 

In  this  section,  a  simple  camera  and  image  model  is  presented  which  incorpo¬ 
rates  measurement  errors.  The  model  is  then  used  to  define  feature  selection  and 
correspondence  methods  for  the  most  common  feature  tracking  methods. 

2.8  Kalman  Filtering 

In  this  section,  Kalman  filtering  concepts  are  presented,  including  a  development 
of  the  linear  Kalman  filter  equations,  the  extension  of  the  linear  equations  to  nonlinear 
problems,  and  the  benefits  of  tightly  coupling  sensors  using  a  feedback  Kalman  filter. 

2.8.1  Linear  Kalman  Filter.  The  Kalman  filter  is  a  recursive,  optimal  data 
processing  algorithm  [37].  The  goal  of  the  algorithm  is  to  estimate  the  solution  to  a 
linear  stochastic  differential  equation.  In  this  development,  the  general  form  of  the 
stochastic  differential  equation  is  explained,  and  the  Kalman  filter  propagation  and 
update  equations  are  presented. 
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The  general  form  of  the  linear  stochastic  differential  equation  is 


x  =  Fx  +  Bu  +  Gw 


(2.99) 


where  x  is  the  state  vector,  F  is  the  homogeneous  system  dynamics  matrix,  B  is  the 
input  matrix,  u  is  the  input  vector,  G  is  the  noise  transformation  matrix,  and  w  is 
a  vector  of  white  noise  processes.  The  white  noise  vector  is  a  zero  mean  Gaussian 
process  with  covariance 


E{w(f)wr(t  +  r)}  =  Q  (t)S(r) 


(2.100) 


where  5(t)  is  the  Dirac  delta  function. 

Because  the  state  vector  is  a  vector  of  random  elements,  the  solution  to  the 
stochastic  differential  equation  is  a  time-varying  probability  density  function.  The 
Kalman  filter  derivation  assumes  Gaussian  distributions  for  all  random  variables,  thus 
a  complete  characterization  of  the  probability  density  function  requires  calculating 
only  the  mean  and  covariance  of  the  state  vector  [66]. 

The  mean  of  the  state  vector  is  defined  as 


(2.101) 


to 


where  &(t,t0)  is  the  state  transition  matrix  from  time  t0  to  time  t,  and  E{ - }  is 
the  expectation  operator  [37].  Note  that  this  is  equivalent  to  the  solution  of  the 
deterministic  linear  differential  equation. 

The  covariance  of  the  state  vector  is  defined  as  the  difference  between  the  mean 
square  value  and  the  mean: 


Pxx(t)  =  E{x(f)xT(f)}  -  mx(t)mTx(t) 


(2.102) 
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The  state  covariance  matrix  expressed  as  a  function  of  the  system  dynamic  parameters 


is 


Pxx(t)  =  $(t,t0)Pxx(t0)$T(t,t0)  +  f  $(t,r)G(r)Q(r)GT(r)$r(t,r)dr  (2.103) 

■  'to 

Because  the  state  vector  is  defined  as  a  Gaussian  process,  the  maximum  a 
posteriori  (MAP)  estimate  of  the  state  is  simply  the  mean  [66].  The  MAP  state 
estimate  is  denoted  with  the  hat  notation: 

x(t)  =  m  x(t)  (2.104) 

The  covariance  is  an  estimate  of  the  uncertainty  in  the  state  estimate  and  is  defined 
using  the  hat  notation: 

P  (t)  =  Pxx(t)  (2-105) 

In  addition  to  the  state  dynamics,  the  Kalman  filter  normally  incorporates  dis¬ 
crete  measurements.  The  measurements  are  random  vectors  modeled  as  linear  func¬ 
tions  of  the  state  vector  with  additive  noise: 


z  (U)  =  H(tj)x(tj)  +  v(ti)  (2.106) 

where  H(£j)  is  the  observation  matrix  and  vft,)  is  a  zero- mean  white  Gaussian  noise 
vector  with  covariance  defined  by 


E  {v(ti)vT(tj)}  =  R  (ti)Sij 


(2.107) 


where  A,  is  the  Kroeneker  delta  function. 

Incorporating  the  system  dynamics  and  update  models  into  an  algorithm  results 
in  two  distinct  operations:  measurement  updates  and  propagation.  Assume  measure¬ 
ments  are  available  at  discrete  times,  (ti,ti+ 1, . . . ,  ti+n),  and  estimates  of  the  mean 
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and  covariance  are  available  immediately  prior  to  time  tt  (denoted  by  t-  .)  The  mean 
and  covariance  after  incorporating  the  update  (denoted  by  tf),  is  defined  as  [37] 

x(t+)  =  x(fr)  +K(t()  [z((j)  -H((i)x(«r)]  (2.108) 

P(t+)  =  P(t-)  -  KfeJHfeJPft-)  (2.109) 

where  K(b)  is  the  Kalman  gain  matrix,  defined  by 

K(U)  =  P(t~)HT(ti)  [H (ti)P(tr)BT{ti)  +  Rfe)]”1  (2.110) 

After  the  update  at  time  U  has  been  completed,  the  state  estimate  and  covari¬ 
ance  are  propagated  to  the  next  update  time,  ti+ 1,  using  the  solution  to  the  stochastic 
differential  equation  previously  derived: 

x(^~+1)  =  &(ti+uU)Z(tt+i)+  [  ^{ti+1,r)B(r)u(r)dT  (2.111) 

Jti 

+  f  ^(tm,r)G(r)Q(r)GT(r)$(tj+i,r)Tdr  (2.112) 

Ju 

The  combination  of  the  update  and  propagation  equations  define  a  recursive  algorithm 
which  maintains  an  estimate  of  the  state  and  the  state  uncertainty  based  on  a  system 
dynamics  model  and  incorporating  measurements. 

2.8.2  Extended  Kalman  Filter.  The  linear  Kalman  filter  developed  in  the 
previous  section  is  based  on  a  linear  system  model  driven  by  white  Gaussian  noise. 
While  this  is  an  appropriate  description  of  many  real-world  systems,  it  is  not  ade¬ 
quate  for  some  problems  [38].  These  nonlinear  problems  motivate  the  development 
of  the  extended  Kalman  filter  (EKF).  The  development  begins  with  a  definition  of  a 
nonlinear  system  model. 
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The  proposed  nonlinear  system  model  is  defined  as  the  following  stochastic 
differential  equation  driven  by  white  Gaussian  noise: 

x(£)  =  f  [x(£),  u(t),£]  +  Gw(£)  (2.113) 

Noise-corrupted,  discrete  measurements  are  available  in  the  following  form: 


z(£j)  =  h[x(£j),£j]  +  v(£j)  (2.114) 

Perturbation  techniques  are  used  to  approximate  the  system  about  a  nominal 
trajectory.  This  results  in  linear  stochastic  equations  which  can  be  solved  using  the 
traditional  linear  Kalman  filter  propagation  and  update  equations  [38]. 

The  derivation  proceeds  by  defining  a  nominal  trajectory  and  estimating  the 
errors  about  the  nominal  trajectory.  Thus,  the  perturbation  model  for  the  state 
vector  is  defined  as  the  sum  of  a  nominal  trajectory  and  an  error  state: 

x(t)  =  x(£)  +  Sx(t)  (2.115) 

The  nominal  trajectory  is  defined  as 

x(£)  =  f  [x(t),  u(£),  t]  (2.116) 

Applying  the  error  modeling  procedure  presented  in  Section  2.4  yields  the  following 
approximation  to  the  nonlinear  stochastic  dynamics  model: 

x(t)  +  <5x(£)  =  f  [x(f),  u(£),  t]  +  F (£)<5x(£)  +  Gw(t)  (2.117) 

which  is  the  sum  of  the  deterministic  nominal  trajectory  (2.116)  and  a  stochastic 
linear  differential  equation 


5x.(t)  =  F  (t)Sx.(t)  +  Gw(t) 


(2.118) 
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The  state  estimate  is  defined  as  the  sum  of  the  nominal  trajectory  and  the 
estimated  perturbation: 

x(t)  =  x(t)  +  Sx(t)  (2.119) 

The  state  error  covariance  is  denoted  by  P  (t). 

To  propagate  from  time  to  t~+1,  the  nominal  trajectory  is  calculated  using 
a  nonlinear  differential  equation  solver  [50],  with  the  state  estimate  at  as  the 
initial  condition.  The  covariance  is  propagated  using  the  time  propagation  equations 
presented  in  the  previous  section. 

To  incorporate  the  measurement  update  at  ti+ 1,  the  nonlinear  update  Equa¬ 
tion  (2.114)  is  linearized  about  the  nominal  trajectory,  x(tjlhl): 


sfc+i)  =  h  [x(ti+1),ti+1]  +  H(ti+i)Sx(ti+i)  +  v(ti+ 1)  (2.120) 


where  H  (t)  is 


H«  =  1 


<■(*), t 


dh 1 

dh 1 

dx\ 

dxn 

dhn 

dhn 

dx\ 

dxn 

(2.121) 


x(t),t 


Defining  the  measurement  as  the  sum  of  a  nominal  and  an  error  component 


z(ti+1)  =  z(ti+i)  +  Sz(ti+1)  (2.122) 

and  subtracting  the  nominal  component  from  (2.120)  yields  the  perturbation  update 
equation: 

8z(ti+1)  =  H(ti+i)<5x(ii+i)  +  v(ti+ 1)  (2.123) 

The  post-measurement  error  state  vector,  5x(tjffl),  and  covariance,  P(tjffl),  are 
calculated  using  the  Kalman  update  equations  shown  in  the  previous  section. 


51 


Finally,  the  current  error  state  estimate  is  incorporated  into  the  nominal  tra¬ 


jectory: 

x(X++i)  =  x(X“+1)  +  Sx(tt+1)  (2.124) 

the  error  state  estimate  is  reset  to  zero  and  the  process  is  repeated. 

2.9  Optimal  Batch  Estimation 

In  this  section,  the  method  of  optimal  estimation  using  batch  measurements  is 
presented,  focusing  on  the  problem  of  estimating  parameters  given  nonlinear  mathe¬ 
matical  models. 

2.9.1  Linear  Mathematical  Models.  The  concept  of  optimal  batch  estima¬ 
tion  is  introduced  for  the  case  where  the  observation  function  is  well-modeled  using 
a  set  of  linear  stochastic  functions.  The  linear  problem  is  defined  using  the  following 
mathematical  model  [72]: 

z  =  Hx  +  v  (2.125) 

where  H  is  the  design  (or  influence)  matrix,  x  is  the  solution  vector,  z  is  the  obser¬ 
vation  vector,  and  v  is  the  residual  vector. 

The  goal  of  the  regression  is  to  estimate  the  values  of  x  which,  given  H  and 
z,  minimize  the  sum-of-squares  of  the  estimated  residual  vector,  weighted  by  the 
uncertainty  of  the  measurement  error: 


min  (vR  1vT)  (2.126) 

where  R  is  the  measurement  error  covariance  kernel 

R  =  E  [vvT]  (2.127) 
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where  E  [•]  is  the  expected  value  operator.  The  resulting  expression  which  minimizes 
the  sum-of-squares  of  the  residual  vector  is  shown  to  be  [72] 

x  =  (HtR-1H)-1  HtR-1z  (2.128) 

where  the  matrix  (HTR_1H)  is  defined  as  the  matrix  of  the  normal  equations.  A 
unique  solution  exists  if  and  only  if  the  matrix  of  normal  equations  is  nonsingular. 

In  the  next  section,  the  optimal  batch  estimation  concept  is  expanded  to  treat 
nonlinear  mathematical  models. 

2.9.2  Nonlinear  Mathematical  Models.  In  many  cases,  linear  models  are 
inadequate  for  system  modeling,  and  a  nonlinear  system  model  is  required.  This  is 
the  situation  for  many  systems,  including  image  and  inertial  navigation  solutions.  A 
general  nonlinear  model  which  is  useful  for  working  with  imaging  systems  is  defined 
in  [72]  as 

h  (x,  z)  =  0  (2.129) 

where  h  is  a  nonlinear  system  function,  x  is  a  vector  of  system  parameters,  and  z  is 
a  vector  of  (perfect)  observations. 

In  order  to  estimate  the  unknown  system  parameter  vectors,  perturbation  tech¬ 
niques  are  employed.  The  model  is  linearized  about  nominal  system  parameter  vectors 
which  are  defined  as 


x  =  x  +  fix  (2.130) 

z  =  z  +  fiz  (2.131) 

where  x  represents  an  initial  approximation  of  the  system  parameters,  and  z  is  the 
vector  of  (noise-corrupted)  observations.  Substituting  equations  (2.130)  and  (2.131) 
into  (2.129)  yields 

h  (x  +  fix,  z  +  fiz)  =  0  (2.132) 
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Using  perturbation  techniques  (see  Section  2.4)  Equation  (2.132)  is  linearized  about 
the  nominal  state,  resulting  in  the  following  first-order  approximation  (i.e.,  assuming 
h  is  continuously  differentiable  in  x  and  z): 


h  (x,  z)  +  Hx5x  +  Hz5z  fa  0  (2.133) 

where 

(2.134) 

X.Z 

(2.135) 

X.Z 

Given  a  vector  of  noise-corrupted  observations  and  approximated  system  parameters, 
the  misclosure  vector  (w)  is  defined  as 


tu 

H2 


<9h 

dx 

<9h 

dz 


w  =  h  (x,  z) 


(2.136) 


Substituting  Equation  (2.136)  into  (2.133)  yields  the  linearized  form  of  the  re¬ 
gression: 


w  +  Hxhx  +  H2hz  ~  0 


(2.137) 


Assuming  there  is  no  prior  information  regarding  the  system  parameters,  the 
estimated  parameter  error  vector  which  minimizes  the  weighted  sum-of-squared  errors 
is  given  by  [72] 


Sx  =  — 


htx  (h2rhJ)_1  h,  (h2rhJP  w 


-1 


T\~1  . 


(2.138) 


where  R  is  the  measurement  error  covariance  kernel,  defined  in  the  previous  sec¬ 
tion  (2.9.1).  The  estimated  measurement  residual  errors  are  given  by 


Sz  =  — RHj  (HzRHj)  1  (iijx  +  w) 


(2.139) 
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Recalling  equations  (2.130)  and  (2.131),  the  estimated  system  parameters  and 
measurement  residual  errors  are  then  added  to  their  respective  nominal  parameter 
vectors  to  produce  the  whole- valued  state  estimates: 

x  =  x  +  fix  (2.140) 

z  =  z  +  fiz  (2.141) 

In  case  where  there  is  prior  knowledge  regarding  the  values  of  the  system  pa¬ 
rameters,  and  this  uncertainty  is  represented  by  the  covariance  kernel  Pxx,  defined 
as 

Pxx  =  E  [fixfixT]  (2.142) 

the  regression  equation  (2.138)  is  modified  such  that  the  prior  knowledge  is  properly 
weighted 


fix  =  — 


p-i  +  Hf  (HRH!)  hJ  H'  (H  RH'  !  w 


(2.143) 


For  cases  where  the  solution  (x)  converges  slowly,  the  process  can  be  repeated 
using  the  whole- valued  state  estimate  to  calculate  the  partial  derivatives.  This  iter¬ 
ative  technique  can  improve  the  accuracy  of  future  estimates,  however  local  conver¬ 
gence  is  not  guaranteed  unless  Lyapunov  stability  can  be  shown  [45] .  For  a  complete 
discussion  of  least-squares  techniques,  see  [72],  [30],  and  [67]. 


2.10  Direction  Cosine  Perturbation  Methods 

In  order  to  use  perturbation  methods,  the  calculation  of  partial  derivatives  is 
required.  While  partial  derivative  calculations  for  vectors  is  straightforward,  this  is 
not  necessarily  the  case  for  direction  cosine  matrices.  This  issue  is  addressed  by 
representing  small  angles  using  a  three  dimensional  vector.  From  [64],  a  direction 
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cosine  matrix  can  be  approximated  by 


C£  =  [I  +  0x]  (2.144) 

where  6  is  a  three-element  vector  of  small  angles. 

Noting  the  multiplicative  nature  of  direction  cosine  matrices  allows  the  expres¬ 
sion  of  any  direction  cosine  matrix  as  the  product  of  a  direction  cosine  matrix  and 
a  small  rotation.  For  example,  the  arbitrary  direction  cosine  matrix,  C“,  can  be 
expressed  as 

Cac  =  [I  +  0x}Cbc  (2.145) 

This  property  is  ideal  for  expressing  small  angular  errors  of  the  navigation  system 
based  on  an  estimated  direction  cosine  matrix  of  interest  which  simplifies  the  calcula¬ 
tion  of  partial  derivatives  and  reduces  the  number  of  states  in  the  filter.  This  concept 
is  illustrated  using  an  example. 

A  line  of  sight  vector  of  interest,  sc,  is  given  by 

sc  =  CcbCbn  [yn  -  pn]  (2.146) 

The  true  body  to  navigation  frame  DCM  can  be  expressed  equivalently  as  the  product 
of  a  calculated  body  to  navigation  frame  DCM  and  a  small  angular  change  due  to 
errors: 

Cf  =  [I-V>x]CJ  (2.147) 

Substituting  Eqn.  (2.147)  into  (2.146)  yields 

sc  =  C£C|  [I  +  -0x]  [yn  —  pn]  (2.148) 

Now  calculation  of  the  partial  derivative  of  the  attitude  error  vector,  -0,  is  straight¬ 
forward: 

osc  a 

^  =  ^(c*c*w’x)|y”+p”|)  (2-149) 
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Rearranging  the  skew-symmetric  form  yields 


r) 

di,  =  ^(-c£c«<[y"-  p“]x)*)  (2.150) 

r) 

tt  =  -CJC|([y“-p"]x)  (2.151) 


This  partial  derivative  can  now  be  used  in  an  estimation  algorithm.  Once  an 
estimate  of  the  small  angle  is  available,  the  resulting  estimate  of  the  direction 
cosine  matrix  is 


C 


n 

b 


I  —  TpX 


Cn 

b 


(2.152) 


In  the  next  chapter,  an  analysis  of  the  current  state-of-the-art  image-based 
navigation  techniques  is  presented. 
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III.  Navigation  Using  Imaging  and  Inertial  Sensors  -  An 
Analysis  of  the  State  of  the  Art 


Navigation  using  images  and  inertial  measurements  is  an  area  of  active  research. 

This  chapter  describes  the  current  state  of  the  art  and  establishes  a  context  for 
this  work. 

3. 1  Early  Methods 

The  use  of  various  types  of  optical  measurements  for  navigation  is  not  new. 
In  fact,  some  of  the  oldest  navigation  techniques  are  based  on  optical  measurements 
of  fixed  objects.  In  this  section,  the  classic  techniques  of  celestial  navigation  and 
driftmeters  are  presented. 

3.1.1  Celestial  Navigation.  The  mathematical  predictability  and  observ¬ 
ability  of  the  heavens  provide  an  excellent  source  of  navigation.  Celestial  navigation 
is  one  of  the  oldest  known  methods  for  long-distance  navigation  and  was  practiced  by 
the  world’s  earliest  global  travelers,  the  Phoenicians  and  the  Polynesians  [10].  Early 
navigators  used  the  sky  to  determine  direction  by  observing  the  stars  rising  in  the 
east,  setting  in  the  west  and  rotating  about  the  north.  Later,  development  of  equip¬ 
ment  capable  of  measuring  the  angles  between  the  horizon  and  stars  (e.g.,  astrolabe, 
sextant),  combined  with  precision  timekeeping,  enabled  navigators  to  determine  lat¬ 
itude  and  longitude  directly  [5].  In  this  section,  the  mathematics  related  to  celestial 
observations  are  developed. 

In  simplest  terms,  the  celestial  observation  geometry  is  shown  in  Figure  3.1.  The 
celestial  observation  is  a  measurement  of  the  angle  from  a  star  to  the  local  vertical 
vector.  This  angle  describes  a  circular  line  of  potential  positions  centered  on  the 
projection  of  the  star  of  the  Earth’s  surface.  The  location  of  the  star  is  described  by 
the  meridian  angle  and  declination  angle,  A t  and  0t,  respectively,  from  the  inertial 
reference  frame.  The  location  can  also  be  described  by  the  unit  vector  t*. 
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Figure  3.1:  Celestial  observation  geometry.  The  celestial  ob¬ 
servation  is  a  measurement  of  the  angle  from  a  star  to  the  local 
vertical  vector.  This  angle  describes  a  circular  line  of  potential 
positions  centered  on  the  projection  of  the  star  of  the  Earth’s 
surface.  The  location  of  the  star  is  described  by  the  meridian 
angle  and  declination  angle,  At  and  <pt-  respectively,  from  the 
inertial  reference  frame.  The  location  can  also  be  described  by 
the  unit  vector  t\ 


59 


The  generalized  location  of  a  celestial  body  (e.g.,  star,  planet,  satellite,  etc.) 
can  be  expressed  in  the  Earth-centered  inertial,  or  i,  frame  as  the  vector  t'.  The 
position  vector  can  be  expressed  equivalently  using  a  unit  vector  as 

V  =  DV  (3.1) 

where  D  =  ||t*||.  The  line  of  sight  vector  to  the  object  in  the  ECEF  frame,  se,  is 

se  =  -  pe  (3.2) 

where  pe  is  the  location  of  the  sensor  in  the  e  frame.  And  in  the  camera  frame 

sc  =  DCce CfV  -  Ccepe  (3.3) 

The  line  of  sight  vector  is  normalized  to  homogeneous  coordinates  by  dividing  by  the 
z  component: 

sc  = 

and  then  transformed  to  pixel  coordinates: 

gnx  =  Tpi*gC  (3.5) 

where  Tpix  is  the  intrinsic  camera  projection  matrix.  Adding  pixel  measurement  noise 

defines  a  measurement  equation  of  the  form 

=  h  (pe,  C^,  te,  +  v  (3.6) 

The  measurement  can  be  simplified  by  noting  the  distance  to  a  star  is  much 
greater  than  the  radius  of  the  Earth  (e.g.,  p e /D  m  0.)  Returning  to  Equation  (3.2) 


,pix 


Qpix 

% 
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and  applying  the  infinite  distance  approximation,  yields 


(3.7) 


which  effectively  removes  pe  from  the  measurement.  The  resulting  simplified  mea¬ 
surement  is 


h(Ceb,te,T^x)  +v 


(3.8) 


This  shows  celestial  measurements  are  primarily  useful  for  obtaining  precise  body  ori¬ 
entation.  This  orientation  measurement  can  be  combined  with  inertial  measurements 
to  estimate  position. 

3.1.2  Driftmeter  Navigation.  Prior  to  the  invention  of  the  inertial  navigation 
system,  the  driftmeter  was  used  by  bomber  crews  in  World  War  II  to  determine 
groundspeed  by  tracking  the  apparent  drift  of  objects  on  the  ground  [65].  Later,  a 
method  using  optical  measurements  of  the  lunar  surface  was  suggested  to  provide 
navigation  information  for  the  Apollo  missions  [39]. 

3.2  N on-inertial  Optical  Navigation  Methods 

Although  this  research  is  focused  on  the  fusion  of  optical  and  inertial  mea¬ 
surements,  a  discussion  of  non-inertial  optical  navigation  methods  illustrates  relevant 
optical  navigation  techniques. 

Hagen  [17]  proposes  an  extended  Kalman  filter  for  estimating  an  aircraft’s  po¬ 
sition  and  orientation  using  optical  measurements  and  a  known  topographic  map  of 
elevations.  The  algorithm  uses  feature  detection  techniques  to  select  and  track  so- 
called  “tokens”  from  a  video  sequence.  These  tokens  are  defined  as  a  collection  of 
circular  integrals  of  pixel  intensities: 


(3.9) 
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at  various  radii  (jy)  which  are  invariant  to  linear  shifts  and  rotations.  The  candidate 
tokens  are  then  pruned  to  maximize  a  uniqueness  measure  which  ensures  the  tokens 
are  not  too  closely  spaced  and  the  terrain  around  the  token  is  linear.  Both  of  these 
constraints  are  designed  to  improve  the  stability  of  the  extended  Kalman  filter. 

The  tokens  are  then  tracked  using  a  correlation  algorithm  which  minimizes  the 
norm  of  the  difference  between  the  original  token  and  the  current  image.  Search 
efficiency  was  improved  by  constraining  the  search  area  using  the  current  navigation 
state  estimate  and  covariance. 

A  different  approach  to  estimating  position  using  aerial  images  is  proposed  by 
Sim  [55,56]  and  generalized  by  Lerner  [32].  The  method  begins  by  calculating  a 
number  of  correspondence  vectors,  arranged  in  a  grid  pattern,  between  a  pair  of 
images.  The  resulting  vector  array  is  defined  as  the  optical  flow.  The  optical  flow 
array  is  then  compared  to  a  digital  terrain  model  corresponding  to  the  scene.  This 
results  in  two  different,  but  complementary,  updates.  The  average  optical  flow  vector 
is  treated  as  a  velocity  measurement  and  integrated  to  estimate  position.  Because 
the  position  is  not  measured  directly,  it  is  subject  to  drift  over  time.  The  second 
measurement  type  correlates  the  optical  flow  pattern  with  the  predicted  optical  flow 
pattern  calculated  using  the  digital  terrain  model,  similar  to  the  Terrain  Contour 
Matching  (TERCOM)  technique  for  cruise  missile  guidance  [14].  This  measurement 
provides  a  direct  measurement  of  position  relative  to  the  Earth  and  serves  to  eliminate 
long-term  drift.  The  resulting  navigation  solution  uses  an  ad  hoc  approach  to  integrate 
the  two  measurements.  The  trajectory  is  then  estimated  using  an  iterative,  nonlinear 
solver. 

The  effects  of  sensor  geometry  for  navigation  and  obstacle  avoidance  were  stud¬ 
ied  in  [23].  In  this  research,  an  autonomous  helicopter  was  equipped  with  two  side¬ 
looking  cameras  and  one  omnidirectional  video  camera  and  flown  through  paths  be¬ 
tween  tall  buildings  (i.e.,  urban  canyons).  The  concept  was  motivated  by  the  theorized 
use  of  optical  flow  by  insects  for  relative  navigation  and  obstacle  avoidance  [59] .  The 
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vehicle  was  able  to  reliably  determine  the  distance  from  each  wall  using  the  optical 
flow  measurement. 

This  is  related  to  a  developing  field  which  is  attempting  to  use  biological  vi¬ 
sion  research  to  navigate  using  optical  flow.  Interestingly,  the  insect  compound  eye 
and  tangential  neuron  structure  has  been  shown  to  directly  provide  indications  of 
optical  flow  patterns  tuned  to  specific  elementary  motions  [8, 13,31].  This  discovery 
has  spawned  a  series  of  biomimetic  or  bio-inspired  techniques  for  guidance  and  nav¬ 
igation.  In  addition,  the  parallel-friendly  nature  of  optic-flow  processing  has  led  to 
the  development  of  embedded  very-large  scale  integration  (VLSI)  components  with 
imaging  sensors  with  the  goal  of  providing  elementary  motion  estimates  [19,21], 

The  advantages  of  incorporating  a  stereo  imaging  sensor  for  robot  navigation 
over  uncertain  terrain  was  investigated  in  [41,42],  The  system  modeled  the  locations 
of  landmarks  as  a  Gaussian-distributed  random  variable  and  estimation  algorithms 
were  developed  using  an  optimal  maximum-likelihood  motion  estimation  model  versus 
the  less-optimal  least-squares  formulation.  Initial  simulation  results  demonstrated 
nonlinear  error  growth  rate  for  optical-only  navigation.  Incorporating  an  attitude 
sensor  reduced  the  error  growth  significantly. 

This  research  also  showed  innovation  in  the  method  used  to  select  and  track 
features.  The  features  were  selected  to  maximize  the  Forstner  interest  operator  [12] 
with  detail  in  the  direction  parallel  to  both  cameras.  This  is  equivalent  to  tracking 
points  which  will  provide  the  most  accurate  depth  measurement.  In  addition,  the 
predicted  feature  location  is  constrained  using  an  ad  hoc  method  based  on  assuming 
a  relatively  flat  terrain  model.  Finally,  the  beneficial  effects  of  tracking  objects  in 
multiple  images  for  stabilizing  and  improving  long-distance  navigation  is  noted. 

Applying  Bayesian  methods  to  solve  the  correspondence  between  features  in 
multiple  images  was  approached  by  Bedekar  [2],  This  maximum  a  posteriori  (MAP) 
estimator  solves  for  the  best  correspondence  estimate  while  accounting  for  random 
camera  errors  and  random  feature  location  error.  Let  the  vectors  z^,  z2j, . . . ,  znj 
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represent  the  observed  feature  locations  for  given  feature,  j,  in  image  coordinates 
from  a  sequence  of  images  Ii,  I2, . . . ,  In,  respectively.  The  true  navigation  state  vector 
for  each  image  Ii,  I2, . . . ,  In  is  given  by  xi,  x2, . . . ,  xn,  respectively.  The  estimated 
navigation  state  vector  xn  is  modeled  as  a  Gaussian  random  vector  with  mean  equal 
to  the  true  navigation  state  vector  and  covariance  Pn.  The  true  location  of  each 
feature,  j,  in  world  coordinates  is  given  for  each  epoch  by  q^,  q2 j, . . . ,  q mj. 

The  observed  feature  locations  can  now  be  expressed  as  a  function  of  the  true 
location  in  world  coordinates  and  the  navigation  state: 

z  ij  =  P(xi,  q,)  +  Vij  (3.10) 

where  P(-,  •)  is  the  function  which  projects  features  into  the  image  plane  based  on  the 
navigation  state  and  is  a  random  measurement  noise.  The  triangulation  problem 
is  solved  by  choosing  q  and  x  which  maximizes  the  a  posteriori  probabilities: 

p(qi|zii,z2j,...,znj,xi,x2,...,xn)  j  =  1,2,...,  Mi  (3.11) 

over  the  various  correspondence  combinations  between  true  feature  locations  and  the 
locations  in  the  image. 

This  approach  is  important  because  it  provides  the  basis  for  treating  the  navi¬ 
gation  state  and  the  feature  location  measurements  as  random  variables,  resulting  in 
a  navigation  state  estimate  which  is  statistically  rigorous.  This  concept  could  be  ex¬ 
tended  into  a  stochastic  process  by  incorporating  a  navigation  state  dynamics  model 
and  additional  measurements. 

3.3  Inertially- aided  Optical  Navigation 

In  this  section,  relevant  approaches  for  inertially-aided  optical  navigation  are 
described  and  analyzed. 
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3.3.1  Bhanu,  Roberts,  and  Ming.  Integrating  inertial  measurements  with  a 
digital  image  sequence  to  enhance  object  avoidance  was  proposed  by  Bhanu  in  [52] 
and  [4],  This  system  leverages  inertial  measurements  to  calculate  the  translation 
and  rotation  between  images  to  help  solve  the  correspondence  problem  and  estimate 
object  distance.  While  the  method  is  not  directly  related  to  solving  the  navigation 
problem,  there  are  a  number  of  similarities  which  motivate  a  discussion. 

The  algorithm  consists  of  seven  steps  which  are  executed  between  pairs  of  images 
taken  at  times  ta  and  4- 

1.  Calculate  the  navigation  state,  x(4)  and  x(4)  corresponding  to  images,  1(4) 
and  1(4),  respectively. 

2.  Select  interest  points,  z(ta)  and  z (4),  from  each  frame. 

3.  Locate  the  focus  of  expansion  in  each  image. 

4.  Project  the  focus  of  expansion  and  interest  points  onto  image  plane  parallel  to 
image  a. 

5.  Match  the  interest  points  in  frame  b  to  frame  a. 

6.  Compute  the  range  to  each  interest  point. 

7.  Create  a  dense  map  of  range  to  obstacles. 

The  interest  points  (i.e.,  features)  are  calculated  using  a  derivative  of  the  Harris 
feature  detector  with  k  parameter  equal  to  zero  (see  Eqn.  2.95).  The  focus  of  expan¬ 
sion  is  the  projection  of  the  position  difference  between  times  4  and  4  onto  the  image 
plane  at  time  4-  This  focus  of  expansion  vector  is  the  common  vanishing  point  for 
stationary  objects  due  to  translational  camera  motion. 

The  interest  points  in  frame  b  are  then  “derotated”  into  an  image  plane  parallel 
to  frame  a  using  the  change  in  attitude  detected  by  the  inertial  sensor  and  the  pro¬ 
jection  operator.  This  establishes  an  equivalent  theoretical  image  space  which  results 
from  a  purely  translational  motion.  This,  combined  with  the  projection  of  the  focus 
of  expansion,  allows  the  algorithm  to  more  easily  establish  correspondence  between 
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interest  points  in  both  images.  The  derotation  operation  is  expressed  by  the  following 
equation: 


(4)  =  C?C"z«(4) 

(3.12) 

(4)  =  C>‘>(4) 

(3.13) 

where  C£“  is  the  DCM  from  camera  frame  at  time  ta  to  time  4>  C£“  is  the  DCM 
from  the  navigation  frame  to  the  camera  frame  at  time  ta,  C”  is  the  DCM  from  the 
camera  frame  at  time  4  to  the  navigation  frame,  zCb  (4)  and  zc"  (4)  are  the  location  of 
interest  points  at  time  4  coordinitized  in  the  q,  and  ca  frames,  respectively.  The  focus 
of  expansion,  p^,  is  simply  the  rotation  of  the  position  difference  between  epochs  ta 
and  4  into  the  camera  frame  at  time  4 


PS  =  CgpS  (3.14) 

Once  the  image  is  derotated  and  the  focus  of  expansion  is  established,  the  cor¬ 
respondence  between  interest  points  is  calculated  using  goodness-of-fit  metrics.  One 
relevant  metric  is  the  correspondence  search  constraint  placed  on  each  point.  This 
constraint  ensures  each  interest  point  lies  in  a  cone-shaped  region,  with  apex  at  the 
focus  of  expansion,  bisected  by  the  line  joining  the  focus  of  expansion  and  the  interest 
point  in  the  camera  frame  at  ta.  While  this  constraint  is  not  statistically  rigorous,  it 
does  show  the  value  of  using  inertial  measurements  to  aid  the  correspondence  prob¬ 
lem,  which  is  an  important  contribution.  The  second  contribution  of  this  work  is  the 
concept  of  applying  transformations  to  the  image  based  on  inertial  measurements  to 
produce  a  motion  model  which  is  more  tractable.  In  this  example,  a  purely  transla¬ 
tional  model  was  preferred;  however,  other  transformations  are  possible. 

3.3.2  Roumeliotis,  Johnson,  and  Montgomery.  Another  interesting  proposal 
for  augmenting  inertial  navigation  with  image-based  motion  estimation  is  presented 
in  [53].  In  this  paper,  the  problem  of  landing  an  interplanetary  probe  on  Mars  is 
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addressed.  The  approach  combines  measurements  from  an  inertial  sensor,  image 
sensor,  and  laser  altimeter  using  an  extended  Kalman  filter.  What  makes  this  work 
unique  is  the  direct  exploitation  of  scene  homography  for  use  as  updates  to  a  filter. 
In  other  words,  based  on  an  assumed  relatively  flat  scene  (e.g.,  looking  down  at  the 
ground  during  descent),  the  feature  tracking  algorithm  can  produce  a  measurement 
of  pitch  and  roll. 

A  comparison  of  homographic  techniques  is  presented  in  [1],  where  a  homogra¬ 
phy  is  defined  as  a  3  x  3  matrix  capable  of  describing  the  projective  transformation 
of  points  viewed  on  a  plane  between  images.  In  order  to  compute  a  homography,  a 
minimum  of  four  point  correspondences  from  objects  on  a  plane  are  required.  As  can 
be  expected,  homographic  methods  work  the  best  when  the  scene  is  well- modeled  as 
a  plane;  however,  the  method  can  be  extended  to  arbitrary,  but  known,  structures. 

3.3.3  Pachter,  Polat,  and  Porter.  A  theory  for  INS  aiding  by  tracking  a 
ground  object  at  unknown  location  is  presented  in  [44,46,47].  In  this  approach,  an 
aircraft  is  equipped  with  a  telescope  with  a  gimbal  mounting  which  enables  the  user 
to  measure  angles  and  angular  rates  to  a  fixed  ground  target,  similar  to  a  modern 
driftmeter  [65]. 

For  clarity,  the  initial  derivation  is  performed  using  a  two-dimensional  profile. 
Assuming  a  straight  flight  path  with  constant  angle  of  attack,  the  location  of  the 
unknown  target  and  the  aircraft  velocity  is  estimated  using  measurements  of  the  initial 
deflection  angle  and  angular  rate  from  the  driftmeter.  Once  these  parameters  are 
estimated,  the  angle  from  the  aircraft  velocity  vector  to  the  initial  line  of  sight  (LOS) 
to  the  target  is  calculated.  Assuming  a  non-accelerating  or  rotating  flight  path,  the 
angle  of  attack  can  then  be  estimated  directly  from  future  angle  measurements  to  the 
target.  Finally,  this  angle  of  attack  measurement  is  used  to  estimate  the  navigation 
state  using  inertial  measurements  and  a  weighted  least  squares  formulation.  This 
illustrates  a  mathematical  technique  for  estimating  angle  of  attack  and  sideslip  using 
optical  and  inertial  measurements  for  non-maneuvering  flight. 
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3.3.4  Raquet  and  Giebner.  Raquet  proposes  an  extended  Kalman  filter 
which  combines  optical  measurements  of  an  object  at  unknown  location  with  inertial 
and  barometric  altimeter  measurements  [51].  In  the  following  paragraphs,  an  overview 
of  the  filter  derivation  and  the  flight  test  results  are  presented. 

The  filter  is  a  tightly-coupled,  feed-forward  design,  error  state  model,  commonly 
used  in  inertial  navigation  systems  [37].  A  block  diagram  of  the  filter  is  shown  in 
Figure  3.2.  As  mentioned  previously,  the  filter  estimates  the  errors  in  the  navigation 
state  produces  by  the  INS.  This  perturbation  model  reduces  the  effects  of  unmodeled 
super-linear  dynamics  and  improves  performance  [37,38].  Thus,  the  state  vector,  fix, 
which  includes  the  inertial  error  states,  can  be  defined  as 

Svn 

<5x  =  3hbaro 

tn 


where  hp",  6vn,  and  tp  are  the  position,  velocity,  and  attitude  errors,  respectively. 
The  barometric  altimeter  error,  Shbaro  is  modeled  as  a  first-order  Gauss-Markov  pro¬ 
cess  [37].  The  three  dimensional  target  location  estimates  for  n  ground  targets  are 
represented  by  t”  . . .  t”.  The  targets  are  modeled  as  stationary  (zero  dynamics)  with 
a  small  additive  random  walk  (process  noise)  to  avoid  filter  lock-out  caused  by  a 
covariance  eigenvalue  approaching  zero  [37] . 

The  state  dynamics  are  a  function  of  the  INS  error  model  and  are  expressed  as 
a  stochastic  linear  differential  equation: 

<5*  =  F  G  (C?)  w  (3.16) 


(3.15) 
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Figure  3.2:  Block  diagram  of  feed- forward  navigation  Kalman 
filter.  The  Kalman  filter  uses  external  measurements  to  estimate 
the  errors  in  the  current  navigation  state  produced  by  an  inertial 
navigation  system  (INS).  The  estimated  errors  are  then  removed 
from  the  INS  state  resulting  in  a  nearly  optimal  estimate. 


which  is  a  function  of  the  current  estimated  trajectory  and  inertial  measurements  (see 
Section  2.6.) 


The  optical  measurement  model  describes  the  pointing  vector  from  the  aircraft 
to  the  each  target  as  a  function  of  the  current  state.  The  measurement  equation  for 
target  n  is  expressed  as 


OLn 

Pn 


=  hn(x)  +  v 


(3.17) 


where  an  and  ftn  are  the  horizontal  and  vertical  angles  to  target  n.  The  x  vector  is 
the  true  navigation  state.  Using  the  method  of  perturbations,  the  measurement  is 
linearized  about  the  estimated  aircraft  state: 


Szn  =  Hn<5x  +  v  (3.18) 

where  Hn  is  the  Jacobian  of  the  nonlinear  measurement  function,  hn  [•],  about  the 
current  state  estimate.  Note  that  the  elements  of  the  H  matrix  related  to  orientation 
error  are  not  modeled  in  this  implementation. 

The  linearized  dynamics  and  measurement  update  equations  were  incorporated 
into  an  extended  Kalman  filter  and  tested  using  flight  test  data.  The  test  trajec¬ 
tory  was  a  circular  pattern,  approximately  three  kilometers  above  four  fixed  targets, 
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chosen  to  keep  the  targets  in  the  camera  field  of  view.  The  pixel  location  to  each 
target  was  measured  by  hand  and  converted  to  angular  measurements  for  process¬ 
ing.  Incorporating  optical  measurements  improved  the  navigation  position  error  by 
74  percent  over  the  free  inertial  solution.  This  demonstration  showed  the  benefits 
of  incorporating  optical  measurements  of  objects  at  unknown  locations  with  inertial 
sensors. 

3.3.5  Strelow,  Singh,  and  Adams.  Strelow  [60-62]  approaches  the  problem 
of  integrating  optical  and  inertial  sensors  using  optimal  online  and  batch  techniques. 
Optimal  online  processing  estimates  the  current  navigation  state  conditioned  on  all 
prior  measurements,  using  a  some  statistical  measure  of  optimality.  Note  that  the  def¬ 
inition  of  optimality  is  dependent  on  the  actual  measure  chosen.  As  the  name  implies, 
online  methods  can  theoretically  be  used  to  estimate  the  navigation  state  in  real-time, 
onboard  some  vehicle.  Conversely,  batch  processing  involves  estimating  an  optimal 
trajectory,  conditioned  on  all  measurements  for  the  entire  trajectory.  Batch  methods 
typically  have  improved  performance  over  online  methods,  especially  in  highly  non¬ 
linear  situations  such  as  correspondence  solutions,  and  can  serve  to  define  the  best 
achievable  performance  for  a  given  trajectory  and  measurement  realization  [60] . 

The  online  algorithm  described  in  [61]  integrates  inertial  and  optical  measure¬ 
ments  using  an  extended  Kalman  filter  algorithm.  Using  notation  consistent  with  the 
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development  in  Chapter  II,  the  state  vector  for  n  tracked  targets,  x(t),  is 


x(£) 


P  e(t) 
ve(t) 
ve(£) 
4>(t) 
<{t) 
ab° 
b6° 

gfco 

te 


(3.19) 


where  pe,  ve,  and  ve  are  the  position,  velocity,  and  acceleration  vectors;  0  is  a 
vector  of  Euler  angles  from  the  world  frame  to  the  body  frame;  u>heb{t )  is  the  angular 
velocity  vector;  a6" ,  b6°,  and  g6°  are  the  accelerometer  bias,  gyro  bias  and  gravity 
vectors  with  respect  to  the  body  coordinate  frame  at  t  =  0,  respectively.  Finally,  the 
target  locations  in  the  world  frame  are  expressed  by  the  vectors,  tf , . . . ,  t® .  The  state 
estimate  and  covariance  is  propagated  using  the  extended  Kalman  filter  algorithm 
presented  in  Section  2.8.2. 

The  image  measurements  utilizes  the  pixel  location  measurement  for  each  target 
in  the  image,  zn(b),  which  is  expressed  as  the  projection  of  the  relative  Euclidian 
target  location  into  the  image  plane: 


^ n  (b) 


=  7T  [C e(bK(b)] 


(3.20) 


where  7r  is  the  projection  operator,  xpix  and  ypix  is  the  target  location  in  pixels, 
and  Cg(b)  is  the  DCM  from  the  ECEF  to  the  camera  frame  at  time  b-  The  three 
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dimensional  target  line  of  sight  vector,  s®  (£*)  is  defined  as 


Sn(ti)  =  t*  -  Pe(t*)  (3.21) 

In  this  derivation,  note  that  the  camera  frame  is  aligned  with  the  body  frame,  (i.e., 

ct  =  i.) 

In  addition,  the  accelerometer  and  gyroscope  measurements,  z a(ti)  and  z 
respectively,  are  incorporated  as  independent  measurements  using  the  measurement 
equations: 


za  =  CgVe  —  C^o  [g6°  +  a6°]  +  va  (3.22) 

zb  =  U)beb  -  b6  +  v6  (3.23) 

where  va  and  va  are  additive  random  noise  vectors. 

The  filter  is  initialized  using  a  short  initial  measurement  sequence  and  a  batch 
algorithm  to  estimate  visible  features.  Features  which  become  visible  after  the  filter 
is  initialized  are  incorporated  using  a  stochastic  map  approach,  described  in  [58],  if 
certain  criteria  are  met. 

The  candidate  target  must  first  be  tracked  through  a  small,  predetermined  num¬ 
ber  of  frames  in  order  to  provide  an  initial  estimate  for  the  three-dimensional  location, 
tn,  and  to  indicate  some  level  of  stability  in  the  track.  Once  a  three  dimensional  so¬ 
lution  is  available  for  a  candidate  target,  a  quality  measure  is  calculated  based  upon 
the  following  heuristic.  Let  l  describe  the  length  of  the  longest  axis  of  the  covariance 
matrix,  Ptt,  associated  with  the  candidate  target.  The  ratio  r  =  l/b  is  calculated, 
where  b  is  the  longest  baseline  between  camera  centers  used  to  calculate  the  initial 
target  location.  If  r  is  less  than  a  fixed  threshold,  the  target  state  is  added  to  the 
filter  using  the  state  augmentation  procedure  described  in  the  next  paragraph.  Note 
that  the  r  heuristic  is  a  measure  of  the  “goodness”  of  the  geometry  described  by  the 
trajectory  and  target  location. 
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The  new  target  is  added  to  the  current  state  vector.  The  state  covariance  matrix 


is  augmented  using 


P«  =  G*PraGj  +  G,P‘  Gf 

pe  _  p  p 

±  xt  —  ^X1-  XX 


(3.24) 

(3.25) 


where  Pxx  is  the  current  hlter  covariance.  Gt  and  Gx  are  the  Jacobians  of  the 
transformation  from  body  (camera)  to  ECEF  frame: 


(3.26) 


(3.27) 


According  to  the  authors,  this  method  accounts  for  noise  in  the  image  observations  and 
state  estimate,  but  does  not  account  for  any  relative  error  between  the  recent  state 
estimates.  Cases  where  this  error  is  significant  (e.g.,  point  feature  correspondence 
errors)  are  recommended  for  future  research. 

The  batch  algorithm  uses  the  Levenberg-Marquardt  method  [50]  to  minimize 
a  nonlinear  cost  function  representing  the  total  error  of  the  current  trajectory.  The 
overall  cost  function  is  the  sum  of  the  visual  and  inertial  errors: 


(3.28) 


The  visual  error  term,  Evisuai,  is  an  expression  of  the  image  reprojection  error 


(3.29) 


where  s^x  is  the  observed  pixel  location  of  target  j  at  time  tt ,  t)  is  the  location  of 
target  j  in  the  ECEF  coordinate  frame,  C((C)  is  the  transformation  from  the  world 


73 


to  camera  frame  at  time  t*,  and  7r(!“  [•]  is  the  image  projection  function.  D  (•)  is  a 
Mahalanobis  (weighted  two-norm)  distance  function. 

The  inertial  error  term,  Einertiai ,  is  an  expression  of  the  errors  in  the  position, 
velocity,  and  attitude  vectors: 

n 

Einertiai  =  ^  ^  D  [P  {U)  5  Pintertial  (^) \P inertial  {U-l)\ 

2=1 

n 

+  [V6(^)’  VintertialiU)  KnertialiU-l)] 

2=1 

n 

+  J2D  i^^^intertialiUMinertialiU-l)]  (3.30) 

1=1 

where  Pintertiai(^i)\Pinertiai(^i-i)  is  the  inertial  position  solution  at  tt  integrated  from 
the  inertial  position  solution  at  U-i,  v intertiai(^i)\YinerUai(^i-i )  is  the  inertial  velocity  so¬ 
lution  at  U  integrated  from  the  inertial  velocity  solution  at  2j_i,  and  M’lnterUai  (b )  inertial  (hi - 1 ) 
is  the  inertial  attitude  solution  (expressed  in  Euler  angles)  at  t%  integrated  from  the 
inertial  attitude  solution  at  t*_i.  p e(tj),  ve(£*),  and  V’(^)  are  the  camera  position, 
velocity,  and  attitude  at  time  .  The  distance  function,  D(-).  is  also  a  Mahalanobis 
distance  function. 

The  online  and  batch  algorithms  are  then  tested  using  a  calibrated  camera  and 
inertial  sensor  mounted  on  a  robotic  arm.  The  sensor  is  subjected  to  a  four  degree- 
of-freedom  trajectory,  resembling  a  clover  pattern.  The  tests  are  performed  using  a 
combination  of  methods: 

•  Batch  processing  using  image  and  inertial  measurements. 

•  Online  processing  using  15,  30,  and  45  images  for  initialization 

•  Batch  processing  using  images  only. 

The  results  show  the  value  of  adding  inertial  measurements  to  vision  processing. 

The  performance  of  the  image-only  batch  algorithm  is  unable  to  converge  on  a  realistic 
solution  under  the  test  conditions.  Combining  image  and  inertial  measurements  using 
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batch  processing  results  in  the  best  performance.  The  online  algorithms  combining 
inertial  and  visual  measurements  perform  much  better  than  the  visual  only,  and  show 
improved  performance  with  increased  batch  initialization  size. 

In  addition  to  the  batch  and  online  processing  methods,  Strelow  also  proposes 
incorporating  inertial  measurements  to  constrain  the  correspondence  search  between 
image  frames  [62],  This  constraint  on  the  image  search  space  is  a  similar  concept  to 
the  field  of  expansion  method  proposed  by  Bhanu  [4];  however,  Strelow  generalizes 
the  approach  by  exploiting  epipolar  geometry. 

The  projection  of  an  arbitrary  point  in  an  image  is  described  by  an  epipolar 
line  in  a  second  image  (see  Figure  3.3.)  All  epipolar  lines  in  an  image  converge  at 
the  projection  of  the  focus  of  the  complimentary  image.  Combining  knowledge  of  the 
translation  and  rotation  between  images  and  the  pixel  location  of  a  candidate  target 
in  the  first  image,  a  correspondence  search  is  then  constrained  to  an  area  near  the 
epipolar  line.  This  approach  is  shown  in  Figure  3.4. 

There  are  some  limitations  in  Strelow’s  work  which  warrant  further  research. 
First,  while  he  develops  an  initial  concept  of  using  inertial  measurements  to  constrain 
the  correspondence  search  along  an  epipolar  line,  it  is  ad  hoc  in  the  sense  that  is  not 
a  stochastically  rigorous  development.  In  addition,  the  planar  correspondence  search 
constraints  needs  to  be  expanded  to  multiple  dimensions  when  incorporating  higher 
dimensional  features.  Another  limitation  of  Strelow’s  work  is  the  limited  applicability 
to  an  airborne  environment,  since  his  experiments  are  restricted  to  captive  robot  arms 
and  land-based  robots  with  very  limited  range.  An  airborne  platform  is  expected  to 
navigate  over  a  relative  large  area,  without  visual  overlap,  for  a  period  of  hours. 
Finally,  he  does  not  exploit  a  priori  world  and  scene  information,  which  could  be  of 
significant  value  to  an  operationally  useful  navigation  system. 

In  the  next  chapter,  image  and  inertial  navigation  algorithms  are  proposed  which 
address  the  shortcomings  identified  in  the  previous  sections. 
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Figure  3.3:  Epipolar  geometry  between  two  images.  The  pro¬ 
jection  of  an  arbitrary  point  in  an  image  is  described  by  an 
epipolar  line  in  a  second  image.  All  epipolar  lines  in  an  image 
converge  at  the  projection  of  the  focus  of  the  complimentary 
image. 
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Figure  3.4:  Correspondence  search  constraint  using  epipolar 

lines.  Given  a  projection  of  an  arbitrary  point  in  an  initial  image, 
combined  with  knowledge  of  the  translation  and  rotation  to  a 
second  image,  the  correspondence  search  can  be  constrained  to 
an  area  near  the  epipolar  line.  Note  the  epipole  can  be  located 
outside  of  the  image  plane,  as  shown  in  this  example. 
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IV.  Image  and  Inertial  Fusion  Navigation  Algorithm 


his  chapter  presents  an  innovative  approach  for  rigorously  integrating  imaging 


_L  and  inertial  sensors  at  a  deeper  level  than  the  current  state-of-the-art.  The  chap¬ 
ter  is  organized  as  follows.  First,  the  image  and  inertial  fusion  problem  is  presented 
and  explained  through  a  series  of  fundamental  concepts.  Next,  these  concepts  are 
developed  using  rigorous  mathematical  techniques.  These  mathematical  techniques 
are  then  incorporated  into  an  extended  Kalman  filter  algorithm  used  to  implement 
the  image  and  inertial  fusion  algorithm  for  online  navigation.  Finally,  a  discussion 
of  tracking  maintenance  and  various  potential  image  measurements  which  can  be 
exploited  to  improve  the  navigation  solution  are  presented. 

4-1  Proposed  Image  and  Inertial  Fusion  Algorithm  Walkthrough 

In  this  section,  the  fundamental  steps  of  the  image  and  inertial  fusion  for  navi¬ 
gation  problem  are  presented  in  order  to  provide  a  conceptual  reference  for  the  forth¬ 
coming  detailed  development.  The  concept  is  based  on  the  following  assumptions: 

•  A  strapdown  inertial  sensor  is  available,  and  rigidly  mounted  relative  to  one,  or 
multiple  digital  cameras. 

•  The  digital  camera(s)  capture  images  at  some  interval.  The  time  of  image 
capture  is  known  relative  to  the  inertial  measurements. 

•  An  initial  navigation  state  with  known  accuracy  statistics  is  available. 

•  An  estimate  of  the  distance  to  objects  in  the  scene  is  available  either  through  ex¬ 
ploitation  of  knowledge  of  the  world  (e.g.,  terrain  models)  or  through  binocular 
stereopsis. 

Based  on  the  above  assumptions,  an  example  navigation  trajectory  is  shown  in  Fig¬ 


ure  4.1. 
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Figure  4.1:  Overview  of  image-aided  inertial  algorithm. 
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The  algorithm  consists  of  the  following  fundamental  steps: 


•  Image  capture 

•  Transformation  from  image  to  feature  space 

•  Propagation  of  navigation  state  to  next  image  event 

•  Feature  space  propagation  to  next  image  event 

•  Statistical  feature  correspondence 

•  Trajectory  error  estimation 

As  presented  in  Chapter  II,  the  digital  imaging  device  measures  the  light  in¬ 
tensity  pattern  projected  through  optics  onto  a  sensor.  The  projection  is  a  nonlinear 
function  of  the  scene  and  lighting  conditions,  the  camera  optics,  and  the  pose  of  the 
camera  relative  to  the  scene.  The  measured  image  can  be  thought  of  as  a  surface  in 
three  dimensions  (two  spatial  and  one  intensity)  which  is  corrupted  by  measurement 
noise,  optical  distortions,  and  possibly  spatial  aliasing. 

While  camera  images  are  easily  interpreted  by  humans,  a  transformation  of  the 
image  to  a  space  which  more  directly  supports  the  automated  navigation  algorithm 
is  desired.  This  space  is  termed  the  feature  space.  In  general  terms,  the  feature  space 
transformation  transforms  the  image  array  into  a  collection  of  feature  vectors ,  located 
in  the  feature  space.  Many  potential  feature  transformations  have  been  proposed 
(see  Section  2.7.4).  The  most  desirable  feature  space  transformation  for  exploiting 
image  navigation  information  decomposes  the  feature  space  into  separable  pose  and 
object  dimensions.  An  ideal  set  of  pose  dimensions  would  be  completely  dependent 
on  the  relative  pose  of  the  object,  but  independent  on  the  type  of  object  or  feature. 
Conversely,  the  ideal  set  of  object  dimensions  would  be  completely  independent  of 
the  relative  pose  of  the  object,  and  only  depend  on  the  type  of  object.  This  is 
similar  to  human  interpretation  of  objects.  For  example,  a  pencil  always  looks  like 
a  pencil,  no  matter  what  orientation  or  size  it  appears  to  our  eyes.  Although  no 
algorithm  is  currently  known  which  can  achieve  true  independence  between  the  pose 
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and  object  bases,  feature  transformation  algorithms  have  been  proposed  which  display 
scale  and  rotation  independence.  The  details  of  the  feature  transformation  algorithm 
investigated  during  this  work  are  discussed  in  the  next  section. 

Once  a  collection  of  features  is  calculated  for  the  current  image,  inertial  mea¬ 
surements  are  integrated  so  that  the  relative  change  in  pose  can  be  estimated  when 
the  next  image  is  captured.  As  described  in  Section  2.5.3,  the  calculated  navigation 
state  vector  can  be  expressed  as  nonlinear  differential  equations  which  are  functions 
of  the  accelerometer  and  gyroscope  measurements,  initial  conditions,  gravity  vector, 
and  Earth  rotation  rate  vector.  In  addition  to  estimating  the  mean  of  the  navigation 
state,  the  navigation  state  uncertainty  covariance  is  propagated  using  the  navigation 
error  model  and  the  Kalman  filter  propagation  relations  shown  in  Section  2.6.1  and 
2.8,  respectively. 

The  estimated  changes  in  the  navigation  state  are  then  used  to  predict  the  loca¬ 
tion  of  features  through  a  stochastic  projection  transformation.  This  transformation 
operates  on  the  pose  dimensions  of  each  feature  vector  from  the  previous  image  and 
results  in  an  estimated  pose  and  pose  uncertainty  at  the  time  of  the  next  image.  The 
concept  is  illustrated  in  Figure  4.2.  This  predicted  collection  of  feature  vectors  can 
then  be  compared  to  the  features  detected  in  the  current  image.  This  comparison 
consists  of  matching  feature  vectors  from  one  epoch  to  the  next  by  comparing  the 
respective  feature  vectors  using  a  rigorous,  statistical  weighting.  This  rigorous  sta¬ 
tistical  feature  matching  based  on  inertial  measurements  integrates  the  image  and 
inertial  sensors  at  a  deeper  level  than  previous  methods  and  is  a  significant  contribu¬ 
tion  of  this  research.  The  statistical  matching  is  described  in  detail  in  Section  4.2. 

Once  feature  matches  are  determined,  the  errors  between  the  predicted  feature 
location  and  the  actual  feature  location  are  used  to  correct  errors  in  the  navigation 
state.  This  is  accomplished  using  the  Kalman  filter  update  equations  described  in 
Section  2.8  and  discussed  for  this  specific  implementation  in  Section  4.2. 
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The  cycle  is  then  repeated  for  each  image  captured.  In  the  next  section,  a 
detailed  discussion  of  the  components  of  the  image  and  inertial  fusion  algorithm  is 
presented. 


4-2  Detailed  Discussion  of  Image  and  Inertial  Fusion  Algorithm 

In  this  section,  the  components  of  the  image  and  inertial  fusion  algorithm  are 
presented  in  more  mathematical  detail  in  order  to  illustrate  the  general  feature  trans¬ 
formation  concepts  presented  in  Section  4.1. 


4-2.1  Stochastic  Feature  Transformation.  As  mentioned  in  the  previous 
section,  the  transformation  of  a  captured  image  into  feature  space  is  an  important 
step  toward  extracting  navigation  information.  For  this  work,  this  is  accomplished 
using  a  variant  of  the  scale- invariant  feature  tracking  (SIFT)  algorithm  developed 
by  Lowe  [33].  While  the  SIFT  algorithm  is  not  a  contribution  of  this  research,  it  is 
presented  in  this  section  for  clarity  as  it  is  the  starting  point  for  much  of  the  research 
that  follows.  The  SIFT  feature  transformation  algorithm  produces  feature  vectors 
with  pose  and  object  dimensions  which  are  decoupled  in  translation,  scale,  and  rota¬ 
tion  and  partially  decoupled  in  affine  (warping)  motions.  These  characteristics  make 
this  algorithm  attractive  for  navigation  estimation,  since  the  stochastic  projection  can 
be  applied  to  the  pose  dimensions  with  only  small  effects  on  the  object  dimensions. 


The  feature  transformation  consists  of  three  components:  scale-space  decom¬ 
position,  feature  detection,  and  calculation  of  the  feature  description  vector.  The 
scale-space  decomposition  is  calculated  using  Gaussian  and  Gaussian  difference  spa¬ 
tial  filters  [34] .  A  Gaussian  spatial  filter  or  “Gaussian  blur”  is  defined  as  the  following 
function: 


g  (x,y,a)  = 


2tuj2 


2(7  2 


(4.1) 


where  x  and  y  correspond  to  the  spatial  dimensions  of  the  image  in  pixels  and  a  is 
the  standard  deviation  of  the  blurring  function.  Thus,  given  an  image,  i (x,y),  the 
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filtered  image  can  be  represented  by  a  convolution  in  x  and  y: 


l(x,y,a)  =  i(x,y)®g(x,y,a)  (4.2) 

The  above  function  can  also  be  written  equivalently  as  a  multiplication  in  the  spatial 
frequency  domain: 

Mix,  fy,  (T)  =  I (fx,  fy)G(fx,  fy,  a)  (4.3) 

where  l(fx,  fy)  is  the  two-dimensional  Fourier  transform  of  the  image  and  G  fy,  a) 
is  the  two-dimensional  Fourier  transform  of  the  Gaussian  blur  function.  For  illustra¬ 
tive  purposes,  the  spatial  frequency  response  of  the  one-dimensional  Gaussian  blur 
function  is  shown  in  Figure  4.3.  Note  the  cutoff  frequency  of  the  Gaussian  function 
decreases  as  the  standard  deviation,  a,  increases. 

The  difference  of  Gaussian  filter  is  defined  as: 

f(x,  y,  k,  a)  =  g(x,  y,  ko)  -  g(x,  y,  a)  (4.4) 

where  x  and  y  correspond  to  the  spatial  dimensions  of  the  image  in  pixels,  k  >  1 
is  the  scaling  frequency  step  constant,  and  a  is  the  base  standard  deviation.  Thus, 
given  an  image,  i (x,y),  the  filtered  image  can  be  represented  by  a  convolution  in  x 
and  y: 

d(x,  y,  k,  a)  =  i(x,  y)  ®  f(x,  y,  k,  a)  (4.5) 

Sample  impulse  responses  of  the  difference  of  Gaussian  filter  are  shown  in  Figure  4.4. 
The  difference  of  Gaussian  function  can  also  be  written  equivalently  as  a  multiplication 
in  the  spatial  frequency  domain: 


D(/r,  fy,  k,  a)  =  I(fx,  fy)F(fx,  fy,  k,  a) 


(4.6) 
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Figure  4.2:  Stochastic  feature  projection.  Optical  features  of  interest  are  mapped 
into  future  images  using  inertial  measurements  and  stochastic  projections. 


Figure  4.3:  Frequency  response  of  the  Gaussian  blur  filter  for  varying  blurring 

function  standard  deviations  (a).  The  cutoff  frequency  of  the  Gaussian  function 
decreases  as  the  standard  deviation,  a,  increases.  Only  one  dimension  of  the  function 
is  shown  for  clarity. 
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Figure  4.4:  Impulse  response  of  the  difference  of  Gaussian  filter. 


Figure  4.5:  Frequency  response  of  the  difference  of  Gaussian  filter.  The  center 

frequency  of  the  bandpass  response  is  a  function  of  the  standard  deviation  values  of 
each  component  Gaussian  function. 
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Figure  4.6:  Sample  image  of  airfield. 


The  spatial  frequency  response  of  the  one-dimensional  difference  of  Gaussian  filter  is 
shown  in  Figure  4.5.  The  spatial  bandpass  nature  of  the  difference  of  Gaussian  filter 
passes  intensity  variations  in  the  image  of  a  specific  frequency,  or  scale. 

By  varying  the  scale  parameter,  k  >  1,  the  image  can  be  decomposed  into 
multiple  scale  spaces,  each  of  which  is  centered  on  a  specific  spatial  frequency.  In  [34], 
the  scale  parameter  is  varied  geometrically  to  maintain  equal  frequency  spacing  per 
octave  of  spatial  frequency.  Given  an  initial  Gaussian  filter  standard  deviation,  do, 
and  spacing  parameter,  k,  the  i  th  difference  of  Gaussian  filter  is  defined  as 

f  0,  y,  i)  =  g (x,  y,  kl+1a0 )  -  g(z,  y,  kla0 )  (4.7) 

A  scale  decomposition  of  a  reference  image  (Figure  4.6)  is  shown  in  Figure  4.7. 
Note  as  the  center  frequency  of  the  difference  of  Gaussian  filter  increases  (with  de¬ 
creasing  i),  the  filtered  images  show  an  increased  sensitivity  to  higher  spatial  frequency 
detail. 
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Figure  4.7:  Sample  image  scale  decomposition.  As  the  filter  center  frequency  in¬ 

creases  (with  decreasing  i),  the  filtered  images  show  increased  sensitivity  to  higher 
spatial  frequency  detail. 
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After  the  image  is  decomposed  into  scale  space,  features  are  then  detected  which 
exhibit  some  local  distinctiveness.  This  is  accomplished  by  locating  a  voxel  (i.e., 
volume  element)  which  is  the  local  maxima  or  minima  of  the  26  adjacent  voxels  in 
x,  y,  and  scale  (i).  Once  a  candidate  feature  is  located  in  a  particular  scale  space, 
the  amount  of  spatial  detail  is  found  by  calculating  the  eigenvalues  of  the  following 
matrix  over  a  window,  W,  which  is  centered  on  the  candidate  feature: 

Q_  Ex,i;ew(^^)2  Ylx,yeW  VfzVfy  ^ 

_  W  Vf,vf,  £,,„ew(vy2  J 

where  Vfx  and  Vf(/  are  the  gradients  of  the  scale  space  image  in  the  x  and 

y  directions,  respectively.  As  mentioned  previously  in  Section  2.7.4,  a  zero  eigenvalue 
corresponds  to  zero  detail  in  the  direction  of  the  eigenvector,  which  results  from  a 
constant  intensity  image.  The  eigenvalues  increase  with  increasing  detail,  indicating 
a  potentially  distinct  feature.  Features  are  declared  by  thresholding  the  eigenvalues 
and  choosing  the  strongest  candidates,  which  is  refined  using  the  Harris  metric  [18], 
which  was  introduced  in  Equation  (2.95),  and  repeated  here  for  clarity: 

C(G)  =  det( G)  +  kt  [frace2(G)]  (4.9) 


which  is  equivalent  to 


6 ’'(G)  —  (1  +  2kt)ai<j2  +  kt(<Ji  +  cr2)  (4.10) 

where  o\  and  a 2  are  the  eigenvalues  of  G,  and  kt  is  a  tuning  parameter.  Smaller  values 
of  kt  favor  detail  in  both  directions,  while  larger  values  of  kf  are  more  accepting  of 
detail  in  only  one  direction  [18]. 

Once  features  are  selected  which  have  sufficient  detail  in  multiple  directions,  the 
object  dimensions  of  the  feature  vector  are  calculated  as  a  function  of  the  intensity 
gradient  of  the  filtered  scale  space  near  the  feature.  The  method  presented  by  Lowe 


builds  a  histogram  of  gradient  orientations  around  the  feature,  then  selects  a  primary 
orientation  of  gradient  vector  which  corresponds  to  the  maximum  histogram  bin. 
The  object  dimensions  of  the  feature  vector  are  then  composed  of  the  normalized 
histograms  of  gradient  around  the  feature.  For  more  information  of  the  SIFT  feature 
transformation  algorithm,  see  [33],  [34]  and  [27]. 

Thus,  an  image  at  time  tt  is  transformed  to  a  collection  of  M  vectors  in  feature 

space: 

i(x,y,U) — >  Vn  e  {1,2,  (4.11) 


where,  for  the  SIFT  feature  transformation  algorithm,  z*(ti)  is  partitioned  into  pose 
and  object  subspaces: 

Kose(U)4xi 


\(u)  = 


(4.12) 


z°nb^ct(U)128xl 


132x1 


where 
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Am 


C ti ) 


Znx  (ti) 
zny  (ti) 

&n(ti) 


0n(U) 


4x1 


(4.13) 


where  znx  and  zn  are  the  pixel  location  in  the  image,  crn  is  the  scale  of  the  feature, 
and  0n  is  the  primary  orientation  of  the  feature. 


In  the  next  section,  the  statistical  predictive  transformation  of  the  feature  vec¬ 
tors  to  a  future  time  is  presented. 
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4-2.2  Statistical  Predictive  Transformation  of  Feature  Space.  The  theory  is 
divided  into  three  sections:  estimating  the  initial  landmark  position  and  covariance 
based  on  the  feature  vectors  calculated  from  the  first  image,  using  inertial  measure¬ 
ments  to  propagate  this  augmented  state  to  the  time  of  the  second  image,  and  pro¬ 
jecting  this  landmark  position  state  onto  the  second  image  as  a  probability  density 
function  in  feature  space.  In  simpler  terms,  these  equations  allow  the  filter  to  pre¬ 
dict  where  a  stationary  feature  should  appear  in  subsequent  images,  thus  providing 
a  statistical  measure  to  constrain  the  search  space  within  the  image.  This  projection 
theory  is  novel  and  provides  the  framework  for  efficient  and  robust  image  and  inertial 
integration. 


4-2.2. 1  Landmark  Error  Statistics.  As  described  in  Section  2.7,  the 
image  produced  by  an  imaging  sensor  is  a  projection  of  the  scene  onto  the  imaging 
array.  Thus  each  location  on  the  imaging  array  corresponds  to  a  three  dimensional 
position  in  the  scene.  Each  scene  location  corresponding  to  the  pixel  location  of  a 
feature  is  defined  as  a  landmark.  The  landmark  position  corresponding  to  a  pixel 
location  is  a  nonlinear  function  of  the  navigation  state,  pixel  location  measurements 
from  the  feature  vector,  z (tf),  slant  distance  from  the  camera  to  the  landmark,  dftj). 
camera  to  body  direction  cosine  matrix,  Cbc,  and  homogeneous  camera  projection 
matrix,  Tpfx  (see  Section  2.7.1  for  a  description): 

y"  =  g  (p”(ii),  c ?(«.),  z(t,),d(U),  Cj,  Tf )  (4.14) 

The  pixel  location  measurement  at  time  ti  is  a  nonlinear  function  of  the  navigation 
state,  landmark  position,  and  camera  parameters  (defined  in  Section  2.7): 

i(U)  =  h(p”(((),  Cf(«,),y”(«i),Cj,  T H  +  v(t,)  (4.15) 
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where  v(£j)  is  a  zero- mean,  additive  white  Gaussian  noise  process  with 

{R  (ti)  U  =  tj 

'  '  (4-16) 

0  U  ^  tj 

The  slant  distance  from  the  camera  optical  center  to  the  terrain,  d(ti),  is  not 
directly  available  from  a  single  feature  measurement  and  must  be  determined  using 
additional  information.  The  methods  include  exploiting  a  model  of  the  terrain,  esti¬ 
mating  distance  using  binocular  vision,  and  leveraging  egomotion  between  multiple 
monocular  images.  The  influence  matrices  must  be  carefully  calculated  in  order  to 
properly  represent  the  contribution  of  the  distance  uncertainty,  which  may  be  highly 
dependent  on  the  navigation  state  and  measurement  errors  themselves.  A  detailed 
treatment  of  the  potential  cases  and  their  effects  on  the  distance  uncertainty  is  pre¬ 
sented  in  Section  4.3.3.  The  most  general  case,  where  d(tj)  is  a  function  of  the 
navigation  state,  the  pixel  location,  and  some  terrain  model  (h): 


d(ti)  =  d(pn(tj),  C £(£;),  z (ti),  h )  (4.17) 

is  assumed  for  the  current  development  of  the  stochastic  error  projection  equations. 
Simplifications  of  the  general  solution  can  then  be  used  to  describe  situations  where 
d(ti)  is  not  a  function  of  the  navigation  state  (e.g.,  binocular  range  estimation). 

Similarly  to  the  navigation  state,  the  calculated  landmark  position,  yn,  is  also 
modeled  as  a  perturbation  about  the  true  position: 

yn  =  yn  +  $yn  (4.  is) 

and  is  a  function  of  the  calculated  trajectory: 

y"  =  g  (p”(«i).C?((i),z(ii),d(fi).C‘,Tr)  (4.19) 
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Applying  perturbation  techniques  to  the  landmark  position  function,  the  landmark 
error,  5yn,  can  be  expressed  as  a  linear  function  of  the  errors  of  the  navigation  state 
(Sx),  pixel  measurement  model  (v(t,)),  and  terrain  model  uncertainty  (Sh): 

8yn  =  Gyx8x  +  GyhSh  +  Gyzv(ti)  (4.20) 


where  the  influence  coefficients  are  given  by 


and 


p 

'^yx 

Gyh 

Gyz 


dg 

dx 

x,z(q),d(ti),Cb  T?x 

dg 

dh 

x,z(ti),d(ti),  Cbc,T?ix 

dg 

dz 

x,z(ti),d(ti),C£,TSix 

8d  =  d  —  d 


(4.21) 

(4.22) 

(4.23) 


(4.24) 


Using  the  linearized  position  measurement,  the  landmark  error  is  a  zero-mean, 
Gaussian  random  vector.  The  landmark  error  covariance,  Pyy(U),  and  cross-correlation 
matrix,  P yx(ti)  are  defined  as 


P  yy(ti)  =  E[8y8yT }  (4.25) 

p  yx(ti)  =  E[5y5xT]  (4.26) 

Substituting  (4.20)  into  (4.25),  and  noting  the  independence  between  navigation  state 
and  pixel  measurement  errors  yields 

P  yy(U)  =  GyxE[8x8xT]GyX 
+GyhE[5h2]GTyh 

+GyzE[v(ti)vT  (ti)]GyZ  (4.27) 
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Substituting  the  previously  defined  covariance  matrices  for  the  navigation  errors,  ter¬ 
rain,  and  pixel  measurement  yields  the  final  form  of  the  landmark  position  error 


covariance: 


(4.28) 


The  cross-correlation  matrices  are  calculated  in  a  similar  manner: 


(4.29) 

(4.30) 


Note  that  P xx(ti)  is  the  covariance  matrix  of  the  navigation  state  error,  and  is  provided 
by  the  Kalman  filter. 


4-  2. 2. 2  State  Propagation.  In  this  section,  the  nominal  navigation 


state,  navigation  error  state,  and  landmark  error  states  are  propagated  from  time  t, 
to  t^+i* 

The  nominal  aircraft  navigation  state  is  propagated  forward  based  on  the  non¬ 
linear  dynamics  model  given  in  Section  2.6,  typically  using  a  nonlinear  differential 
equation  solver  (e.g.,  Euler,  Trapezoidal,  or  Runge-Kutta)  [50]. 

The  landmark  error  dynamics  are  defined  as  a  random  walk: 


Syn  =  G„w„(i) 


(4.31) 


where  Gy  is  the  dynamics  influence  matrix  and  w y(t)  is  a  zero-mean,  white  Gaussian 
noise  process  with  covariance  kernel 


E[wy(t)w^  (t  +  t)}  =  Q  yS(r) 


(4.32) 
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As  mentioned  in  Section  3.3.4,  while  the  landmarks  are  assumed  to  be  stationary 
(zero-dynamics) ,  a  small  additive  random  walk  is  added  to  compensate  for  the  effects 
of  bias  initialization  errors  and  to  prevent  covariance  eigenvalues  from  approaching 
zero  [37]. 

The  general  navigation  error  stochastic  differential  equation  is  defined  in  Sec¬ 
tion  2.6.4  as 

<5x(t)  =  F  (t)Sx(t)  +  Gx[xn(f),f]wx(f)  (4.33) 

The  navigation  and  landmark  error  covariance  propagation  dynamics  are  derived 
using  the  linearized  dynamics  models  given  in  Section  2.6.4: 


P  xx(t) 

=  F(t)Pxx(t)  +  Pxx(t)FT(t) 

+GX  (t)Qx  ( t )  G  J  (t) 

(4.34) 

P  xy(t) 

=  F(t)Pxy(t) 

(4.35) 

P  yy(f) 

=  GyQyGl 

(4.36) 

An  equivalent  expression  for  the  time  propagation  is  represented  by  the  state 
transition  matrix,  3>(^+i,  tf),  which  projects  the  navigation  and  landmark  error  co- 
variance  from  time  U  to  tl+i  [37].  The  resulting  expression  for  the  navigation  and 
landmark  error  covariance  is: 


P xx  1) 

xx(fi)*&  ifi+lUi) 

+  j  <F(ti+i,T  )GxQxG^#T(ti+i,  r)dr 

J  ti 

(4.37) 

P xy  1) 

^{fi+\Ui)P  xyifi) 

(4.38) 

Pyy(^7+l) 

=  P  yyitf)  +  (tj+ 1  —  ti)GyQyGy 

(4.39) 

4-2. 2. 3  Projection  of  Uncertainty  Statistics  onto  Image. 

The  pixel 

projection  function  is  used  to  project  the  navigation  state  and  landmark  location  into 
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the  feature  space  at  time  ti+1.  The  pixel  projection  is 


a  (<i+i)  =  h(p"(ii+1),CJ(ti+1),y"(ti+i),C‘,Tr)  (4.40) 

The  estimated  pixel  location  error,  8z(ti+i),  is  modeled  as  a  perturbation  about  the 
nominal  pixel  location: 

Sz(ti+i)  =  z(ti+i)  -  z (ti+i)  (4.41) 

where  the  estimated  pixel  location,  z(ti+i),  is  calculated  using  the  nominal  navigation 
state  and  landmark  position: 


i(t.+ 1)  =  h(p"(«.+i).Caii+i),y”(ii+i),Cj,Tr), 


(4.42) 


Perturbing  the  pixel  projection  function,  the  pixel  location  error  can  be  ex¬ 
pressed  as  a  linear  function  of  the  errors  of  the  navigation  state  and  landmark  posi¬ 
tion: 

8z(ti+1)  =  H2a,5x(ti+i)  +  HzySy(ti+1 )  (4.43) 

where 


H,y 


ax 

ah 

at 


(4.44) 

(4.45) 


The  pixel  error  covariance,  Pzz(ti+i),  is  defined  as 


Pzz(ti+i)  =  E[5z(ti+1)5zT  (ti+1)] 


(4.46) 
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Substituting  (4.43)  into  (4.46),  and  eliminating  independent  error  sources  yields  the 
pixel  location  covariance: 


(4.47) 


Finally,  the  covariance  of  the  pixel  location  errors  can  be  summarized  by  com¬ 
bining  the  equations  presented  in  the  previous  sections: 


Pzz(ti+1)  =  Uzx^(ti+llti)Pxx(ti)^T(ti+1,ti)'H^x 


+HZX  /  $  (ti+1 ,  r)  Gx  Qx  Gtx®t  (ti+ 1 ,  t)  drH Tzx 


(4.48) 


This  equation  shows  how  an  initial  navigation  state  error  covariance,  Pxx(t,),  height 
uncertainty  variance,  o\ ,  measurement  noise  (characterized  by  R),  and  process  noise 
(characterized  by  Q,  and  Q?/)  can  be  rigorously  projected  to  the  feature  space  at  a 
later  time,  U+i,  as  expressed  by  Pzz{ti+i). 

In  summary,  given  the  pixel  coordinates  of  a  stationary  ground  landmark  at  time 
U,  the  predicted  pixel  coordinates  of  the  same  landmark  at  time  ti+ 1  can  be  described 
by  the  bivariate  Gaussian  probability  density  function  given  in  Equation  (4.48).  Thus, 
the  correspondence  search  for  the  landmark  can  be  constrained  using  a  rigorous  sta- 
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tistical  confidence  threshold  (as  opposed  to  approaches  proposed  in  other  research 
which  are  ad  hoc).  In  the  following  section,  the  stochastic  projection  method  is  used 
to  predict  the  location  (and  uncertainty)  of  a  stationary  landmark  in  an  image. 

4-2.3  Statistical  Feature  Matching.  In  the  previous  section,  feature  vectors 
from  the  image  captured  at  time  are  propagated  stochastically  to  the  next  image 
capture  time,  ti+ 1.  After  the  image  is  captured  at  ti+ 1,  and  the  image  is  transformed 
into  a  collection  of  features,  the  next  step  is  to  determine  the  correspondence  between 
the  predicted  feature  set  and  the  measured  feature  set  at  tl+\.  A  statistically  rigorous 
solution  to  this  correspondence  problem  is  presented  in  this  section.  This  generalized 
solution,  based  on  rigorous  statistics  of  the  navigation  state,  is  an  extension  of  the 
statistical  projection  theory  presented  in  the  previous  section  and  is  an  additional 
contribution  which  results  in  a  novel  correspondence  technique  which  is  robust  and 
efficient. 

In  general,  determining  the  potential  correspondence  between  a  given  feature 
vector  predicted  from  an  earlier  image  and  the  collection  of  feature  vectors  implies 
defining  a  metric  which  measures  the  quality  of  match.  In  this  dissertation,  the 
quality  metric  uses  a  statistical  weighting  of  the  difference  between  the  predicted  and 
candidate  feature  vectors,  which  is  also  known  as  the  Mahalanobis  distance.  The 
Mahalanobis  distance  from  the  predicted  feature  vector,  z*(tj+i),  to  the  measured 
feature  vector,  z*(fi+i),  is  defined  as  the  weighted  inner  product: 

Dn(ti+ 1)  =  [<(^+1)  -  z*(ti+1)]TPz*z*{ti+1)[z*n(ti+1 )  -  z*(fm)]  (4.49) 

where  Pz*z*  is  the  covariance  of  the  predicted  feature  vector.  As  mentioned  previously 
in  Section  4.2.1,  the  feature  vector  covariance  consists  of  strongly  independent  pose 
and  object  descriptor  dimensions.  Assuming  the  independence  of  scale  and  rotation 
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on  the  pixel  location,  the  covariance  matrix  can  be  written  in  a  block-diagonal  form: 


¥zz{ti+i)  0  0  0 

0  Paa  0  0 


(4.50) 


0  0  P  ee  0 


0 


where  Pzz  represented  the  pixel  location  uncertainty  covariance,  Paa  represents  the 
scale  uncertainty  covariance,  Pee  is  the  rotation  uncertainty  covariance,  and  P ZdZd  is 
the  uncertainty  covariance  of  the  descriptor.  Thus,  the  associated  covariances  of  the 
features  serve  to  weight  dimensions  according  to  their  statistical  certainty.  In  the  case 
of  the  SIFT  algorithm  and  associated  pose  prediction  using  inertial  measurements, 
there  is  no  specific  information  provided  by  the  algorithm  regarding  the  statistical 
knowledge  of  PCT(T,  P^,  or  P ZdZd.  Reasonable  assumptions  regarding  their  values 
can  be  estimated  by  observing  the  statistics  of  the  feature  locator  for  a  number  of 
images.  For  the  experiments  presented  in  this  dissertation,  the  scale  and  orientation 
parameters  were  given  zero  weight,  (i.e.,  PCT(T,P^  — >  col),  and  the  distance  metric 
was  decomposed  into  a  pose  distance  and  an  object  descriptor  distance: 


Dpn{ti+ 1)  [zPn(tj_|_i)  zp(£j_|_i)]  PZpZp (b+i ) [zPn (T+i )  zp(tj_|_i)]  (4.51) 
Ddn{ti+i)  =  [zdn(ti+1) -zd(ti+1)]TPZdZd(ti+1)[zdn(ti+1) -zd(ti+1)]  (4.52) 


where 


Pzz(ti+l)  0  0 

0  oo  0 


(4.53) 


0 


0  co 


zdzd(ti+l)  —  I 


(4.54) 
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which  has  the  result  of: 


•  Including  only  the  effects  of  the  predicted  pixel  location  in  the  pose  distance 

metric; 

•  Weighting  all  components  of  the  object  descriptor  vector  equally. 

In  order  to  maintain  independence  of  the  pixel  error  measurement,  and  to  im¬ 
prove  search  speed,  the  two  distance  metrics  are  used  in  succession.  First,  the  pose 
metric  is  used  to  determine  the  subset  of  feature  vectors  existing  within  a  statistical 
confidence  bound  (e.g.,  2a).  Next,  the  object  descriptor  distance  metric  is  calculated 
for  the  subset  of  features  within  the  confidence  bound.  A  successful  match  is  declared 
if  a  feature  is  found  with  an  object  distance  below  a  pre-defined  threshold.  In  addi¬ 
tion,  to  reduce  the  potential  for  false  matches  between  objects  with  similar  features 
within  the  search  area,  a  uniqueness  filter  can  also  be  applied  which  rejects  potential 
feature  matches  which  are  not  sufficiently  distinct  in  feature  space.  This  approach 
attempts  to  maintain  a  balance  between  using  the  statistical  pose  prediction  for  suc¬ 
cessful  correspondence  while  not  “coloring”  the  pixel  error  measurements,  which  are 
assumed  to  be  independent  of  the  filter  states. 

4-2.4  Navigation  State  Error  Estimation.  Once  correspondence  is  deter¬ 
mined  between  predicted  and  measured  features,  the  pixel  location  of  the  feature  in 
the  current  image  is  used  to  estimate  and  correct  the  errors  in  the  navigation  filter. 
In  this  section,  the  measurement  equation  is  mathematically  defined  with  respect  to 
the  augmented  Kalman  filter  states. 

Recalling  Equation  (4.15),  the  pixel  location  measurement  is  a  function  of  the 
true  navigation  state,  landmark  location,  camera  parameters,  and  an  additive  noise 
term.  Using  the  extended  Kalman  filter  method  presented  in  Section  2.8.2  requires 
linearization  of  the  measurement  equation  about  the  current  state  estimate.  The 
perturbation  measurement  equation  for  feature  n  at  time  U+\  is  given  by 


5zn(ti+1)  =  Uzx5x(ti+1 )  +  Hzyn5yn(ti+1)  +  v(ti+1)  (4.55) 
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where  the  influence  matrices,  H,x  and  H zyn,  are  defined  in  Eqns.  (4.44)  and  (4.45). 
The  pixel  measurement  error,  8zn(ti+i),  is  defined  as 

^Zn(^i+l)  =  Zpn(^+l)  _  ZPu(^*+l)  (4.56) 

The  specific  calculation  of  the  influence  matrices  for  this  and  other  update  types 
is  presented  in  Section  4.3.3. 

4-3  Image  and  Inertial  Fusion  Algorithm  Extended  Kalman  Filter  Im¬ 
plementation 

In  the  previous  section,  the  image-aided  inertial  navigation  concept  was  pre¬ 
sented  in  general  mathematical  terms.  In  this  section,  the  mechanics  involved  in 
implementing  the  extended  Kalman  filter  are  discussed.  Because  of  the  complex  na¬ 
ture  of  the  filter,  specific  attention  is  given  to  selecting,  adding,  and  deleting  track 
states  from  the  filter.  In  addition,  the  method  for  calculating  the  measurement  in¬ 
fluence  matrices  are  presented  for  landmarks  of  opportunity,  landmarks  at  known 
locations  (reference  landmarks),  and  celestial  measurements. 

4-3.1  Kalman  Filter  Description.  An  Extended  Kalman  Filter  is  con¬ 
structed  to  estimate  the  errors  in  the  calculated  system  parameters.  In  order  to 
minimize  the  effects  of  linearization  errors,  the  system  parameters  are  periodically 
corrected  by  removing  the  current  error  estimate  [37] .  A  block  diagram  of  the  system 
is  shown  in  Figure  4.8. 

As  defined  in  Section  2.6.4,  the  navigation  error  state  vector,  Sx,  consists  of  posi¬ 
tion,  velocity,  attitude,  accelerometer  bias,  and  gyroscope  bias  errors  and  is  expressed 
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Figure  4.8:  Image-aided  inertial  navigation  filter  block  diagram.  In  this  filter,  the 

location  of  stationary  objects  are  tracked  and  used  to  estimate  and  update  the  errors 
in  an  inertial  navigation  system.  The  inertial  navigation  system  is,  in  turn,  used  to 
support  the  feature  tracking  loop. 


as  a  vector  of  fifteen  elements: 


cfpn 


Svn 


hx  = 


*l> 


a 


b 


(4.57) 


b1 


15x1 


Landmark  tracks  of  interest  are  augmented  to  the  above  state  vector  using  the  pro¬ 
cedure  described  in  Section  4. 3. 2. 2. 
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4-3.2  Feature  Tracking  Algorithm.  In  this  section,  the  tracking  algorithm 
is  presented  which  describes  the  process  of  selecting,  adding,  and  deleting  states 
corresponding  to  specific  landmarks  from  the  filter. 

4-3.2. 1  Track  Selection.  As  mentioned  in  the  previous  section,  each 
landmark  track  requires  three  filter  states  to  estimate  the  location  of  the  landmark  in 
the  navigation  frame.  Practical  limitations  on  computer  resources  effectively  limit  the 
number  of  landmarks  that  can  be  tracked  concurrently.  This  limitation  necessitates 
the  implementation  of  a  track  management  algorithm  to  decide  which  features  to 
track. 

The  general  concept  for  the  track  maintenance  algorithm  is  to  add  and  prune 
landmark  tracks  in  order  to  provide  the  “best”  navigation  information  to  the  filter. 
Although  the  optimal  landmark  choices  are  highly  dependent  on  the  trajectory  and 
scene,  some  general  “rules-of-thumb”  were  used  in  the  track  maintenance  algorithm. 

In  general,  features  which  can  be  easily  and  accurately  tracked  for  long  peri¬ 
ods  of  time  provide  the  best  navigation  information.  This  implies  choosing  features 
which  are:  strong  and  easily  identified  (to  help  maintain  track),  locally  distinct  (to 
eliminate  false  correspondence),  and  well-separated  in  image  space  (to  maximize  fil¬ 
ter  observability).  Thus,  when  Kalman  Filter  landmark  track  states  are  available,  the 
feature  space  of  the  current  image  is  searched  and  new  landmarks  are  added  based  on 
the  above  criteria.  The  filter  states  are  augmented  in  accordance  with  the  stochastic 
projection  algorithm  defined  in  Section  4.2.2. 

In  order  to  maintain  only  the  best  tracks,  stale  landmark  tracks  (i.e.,  no  suc¬ 
cessful  correspondence  available  for  a  given  period  of  time)  are  pruned  by  replacing 
the  associated  filter  states  with  new  candidate  tracks.  Other  track  maintenance  ap¬ 
proaches  are  possible  which  could  theoretically  improve  the  track  performance  (e.g., 
semi-causal,  multiple  model,  or  velocity  prediction);  however,  these  approaches  will 
not  be  pursued  in  this  document. 
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4-3. 2. 2  Track  Addition.  Once  a  new  track  is  identified,  the  navigation 
filter  must  maintain  an  estimate  of  the  landmark  location  and  uncertainty.  This  is 
accomplished  by  replacing  the  current  filter  states  corresponding  to  an  inactive  track. 
For  simplicity  of  implementation,  the  filter  maintains  a  static  number  of  landmark 
track  states.  When  no  landmark  is  currently  being  tracked  in  a  particular  state, 
there  are,  by  definition,  no  available  measurements  which  affect  those  states.  Because 
the  landmark  track  states  are  not  dynamically  coupled  with  the  navigation  states, 
inactive  states  have  no  effect  on  the  navigation  solution,  which  allows  a  simple  state 
replacement  strategy  versus  a  more  complicated  dynamical  approach.  An  example 
illustrating  the  concept  is  presented  below. 

In  this  example,  the  states  corresponding  to  the  n-tli  landmark  are  updated  with 
a  new  track.  In  order  to  rigorously  augment  the  filter  states,  the  landmark  location 
uncertainty  covariance  and  cross-correlation  matrices  (calculated  in  Section  4.2.2)  are 
required.  For  this  illustration,  the  new  uncertainty  covariance  and  cross-correlation 
matrices  are  given  by  Dfy  and  P  +  ,  respectively.  As  shown  in  Figure  4.9,  the  n-th 
landmark  error  states  are  replaced  with  the  d)y + .  Because  the  landmark  position 
errors  are  defined  about  the  mean,  the  initial  error  estimate  is  the  03xi  vector.  Addi¬ 
tionally,  the  existing  covariance  matrices  are  replaced  with  either  the  newly  calculated 
matrices,  or  zeros  as  indicated  by  the  figure. 

4-  3. 2. 3  Track  Deletion.  The  feature  tracking  algorithm  maintains  a 
list  of  the  latest  successful  feature  update  for  each  track.  Tracks  which  have  not  been 
successfully  updated  in  a  given  time  or  number  of  image  frames  are  identified  for 
replacement  using  the  method  described  in  the  previous  section. 

4-3.3  Measurement  Models.  In  this  section,  the  influence  matrices  relating 
to  various  measurement  types  are  derived  and  compared. 

4-3.3. 1  Landmark  of  Opportunity.  The  first  measurement  of  interest 
is  the  landmark  of  opportunity.  As  described  in  Section  4. 2. 2.1,  incorporating  land- 
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marks  of  opportunity  into  the  navigation  filter  requires  two  steps:  1)  determining  the 
initial  landmark  location  and  associated  influence  matrices,  and  2)  calculating  the 
measurement  errors  and  associated  influence  matrices. 

The  method  to  determine  the  initial  landmark  location  and  associated  influence 
matrices  varies  primarily  on  the  method  used  to  estimate  the  distance  to  the  landmark. 
Three  methods  are  addressed  in  this  section:  monocular  imaging  with  statistical 
terrain  model,  binocular  stereopsis  with  no  terrain  model,  and  time-iterated. 

4-  3. 3. 2  Landmark  Location  Estimation  Using  Statistical  Terrain  Model. 

In  this  section,  the  landmark  initial  location  estimate,  yn,  is  calculated  given  the 
undistorted  pixel  location  of  a  feature  of  interest,  z,  and  a  statistical  estimate  of 
the  terrain  height  above  the  navigation  reference  plane,  ht.  An  illustration  of  the 
measurement  geometry  is  shown  in  Figure  4.10. 

The  homogeneous  pointing  vector  in  the  camera  frame  is 


(4.58) 

The  resulting  landmark  location  is  given  by 

y"  =  P"  +  C£  [pL,  +  dd£\  (4.59) 

where  d  is  the  distance  to  the  landmark,  projected  in  the  camera  z  direction.  This 
distance  is  a  function  of  the  navigation  state  and  the  mean  of  the  terrain  height  and 
is  derived  by  expanding  and  rearranging  Equation  (4.59): 

dc;c%f  =  y"  -  [p”  +  CfpJJ  (4.60) 


Extracting  the  z  component  and  dividing  by  the  resulting  scalar  yields  the  landmark 
distance: 


d 


n 


-  [p”  +  cjpL,L 

[C?C‘s«], 


(4.61) 
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Figure  4.9:  Landmark  track  state  replacement.  When  replacing  a  stale  landmark 
track  with  a  new  landmark  track,  both  the  covariance  and  cross-correlation  matrices 
must  me  updated  to  reflect  the  new  geometry. 


CAMERA 


Figure  4.10:  Monocular  imaging  geometry.  The  line  of  sight  vector  from  the  camera 
to  the  landmark,  sc,  is  a  function  of  the  landmark  location,  yn,  sensor  platform 
location,  pn,  and  the  lever  arm  from  the  inertial  sensor  to  the  camera,  pbcarn • 


105 


where  the  [-]z  transformation  is  the  linear  functional  producing  the  z  vector  compo¬ 
nent,  which  is  defined  as 


[pL 


0  0  1 


P  =Pz 


(4.62) 


Note  the  operation  reduces  vectors  to  scalars  and  matrices  to  vectors. 

By  definition,  the  z  component  of  the  landmark  location  is  given  by  the  terrain 
height,  ht: 

[y”L  =  h  (4.63) 


Substituting  Eqn.  (4.63)  into  (4.61)  yields 


d 


ht  ~  [ Pn  +  C%pbcam] 


(4.64) 


Once  the  location  of  the  landmark  is  calculated,  the  influence  matrices  from 
the  various  uncertainties  can  be  calculated.  For  this  example,  the  following  error 
contributors  are  considered: 


•  Position  uncertainty  of  the  platform,  dp” 

•  Velocity  uncertainty  of  the  platform,  8vn 

•  Attitude  uncertainty  of  the  platform, 

•  Attitude  uncertainty  of  the  camera,  a 

•  Pixel  measurement  uncertainty,  5z 

•  Terrain  elevation  uncertainty,  8ht 

The  influence  matrix  corresponding  to  the  position  uncertainty  is  defined  as 


G 


yp 


dyn 

8pn 


(4.65) 


Evaluating  the  partial  derivative  yields 
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yp 


•■3x3 


+  cnhcb~c 


2r 


dd 

<9p" 


(4.66) 
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where 


dd 


dpn  [C£C*sc 


0  0  1 


(4.67) 


The  influence  matrix  corresponding  to  the  platform  attitude  uncertainty  is  de¬ 
fined  as 


Gyip  ~ 


dyn 

dip 


(4.68) 


Evaluating  the  partial  derivative  using  the  method  described  in  Section  2.10  yields 


=  c;  (  [pL,  +  x)  +  cjc5s< 


dd 

dip 


(4.69) 


where 


dd  ,  [(CfcCgSc)  x], 
dip  [CbCbsc]z 


(4.70) 


The  influence  matrix  corresponding  to  the  camera  attitude  uncertainty  is  defined 


r  _  dyn 
ya  da 

Evaluating  the  partial  derivative  yields 

Gya  =  -dCnbCbc[scx]  +  CnbCbcsc— 


(4.71) 


(4.72) 


where 


dd  =  [cfcprx)], 

da  [CJCJCJs'], 


(4.73) 


The  influence  matrix  corresponding  to  the  pixel  location  measurement  uncer¬ 
tainty  is  defined  as 

dyn 

g- = -k  <474) 

Evaluating  the  partial  derivative  yields 


12x2 
0  0 


dd 

b^c~ 


+  crcy^ 


(4.75) 
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where 


dd 

dz 


d- 


pnporrc 

■*"  pix 


12x2 

0  0 


[C?Cfe‘ 


(4.76) 


Finally,  the  influence  matrix  corresponding  to  the  terrain  elevation  uncertainty 
is  defined  as 


n  _  dyn 

Gyh  dht 


(4.77) 


Evaluating  the  partial  derivative  yields 


Gyh  ~ 


[C?Cfe‘ 


-C%Ccsc 


(4.78) 


The  remaining  influence  matrices  are  all  zeros. 

Once  a  landmark  state  has  been  added  to  the  filter,  additional  measurements 
of  the  pixel  location  of  the  target  are  processed  using  the  measurement  equation: 


z 


rjipiajgC 


(4.79) 


where  sc  is  the  homogeneous  form  of  the  line-of-sight  vector,  sc,  defined  as 

S'  =  C i  [C*  (y"  -  p”)  -  pL.]  (4.80) 


The  calculation  of  the  measurement  influence  matrices  proceeds  in  a  similar 
fashion.  The  influence  matrix  corresponding  to  the  position  uncertainty  is  defined  as 
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zp 


dz 

dpn' 


(4.81) 


Evaluating  the  partial  derivative  yields 


nzp  =  Tfx 


c  dsc  c 

dsc 

^zdpn 

dpn 
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(4.82) 
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where 


dsc 

8pn 


-Ccbcbn 


(4.83) 


The  influence  matrix  corresponding  to  the  camera  misalignment  is  defined  as 


H 


za 


dz 

da 


(4.84) 


Evaluating  the  partial  derivative  yields 


nza  =  Tr 


o  c  ds^_ 
'’z  dot 


-111, 


ter 


(4.85) 


where 


dsc 

da 


(scx) 


(4.86) 


The  influence  matrix  corresponding  to  the  landmark  location  uncertainty  is 


defined  as 


H 


zy  — 


dz 

dyn 


(4.87) 


Evaluating  the  partial  derivative  yields 


H  zy  =  Tf* 


c  dsc  c 

dsc 

zQyn 

_dyn  _ 

z 

(4.88) 


where 


f)cC 

_ r^cr^b 

dbr  ~  b  n 


All  other  influence  matrices  are  zero. 


(4.89) 


Jy.3.3.3  Landmark  Location  Estimation  Using  Binocular  Stereopsis. 
Another  method  for  determining  the  distance  of  an  object  within  an  image  is  to  utilize 
binocular  measurements.  Binocular  measurements  do  not  require  a  terrain  model, 
which  makes  this  method  appropriate  for  navigation  in  unknown  environments,  or 
when  the  sensor  is  on  or  near  the  ground.  In  order  to  properly  interpret  binocular 
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BINOCULAR  REFERENCE  FRAME  CENTER 


'\CAMERA  A 


Figure  4.11:  Binocular  imaging  geometry.  The  line  of  sight  vector  from  the  binoc¬ 
ular  reference  point  to  the  landmark,  Sq°,  is  a  function  of  the  landmark  location,  yn, 
sensor  platform  location,  p",  the  lever  arm  from  the  inertial  sensor  to  the  binocular 
reference  point,  pg,  and  the  respective  lever  arms  from  the  binocular  reference  point 
to  cameras  a  and  b,  p c°arUa  and  p';;“mt ,  respectively. 

measurements,  it  is  convenient  to  establish  a  binocular  disparity  reference  frame, 
which  is  located  between  the  optical  centers  of  two  cameras.  The  resulting  binocular 
imaging  geometry  is  shown  in  Figure  4.11. 

Solving  for  a  landmark  location  using  binocular  measurements  requires  three 
steps:  1)  feature  selection  and  matching  between  binocular  image  pairs,  2)  calculation 
of  the  relative  line  of  sight  vector  from  the  binocular  origin  to  the  landmark,  and 
3)  estimating  the  landmark  location  and  augmenting  the  Kalman  filter. 

The  feature  selection  and  matching  process  begins  by  selecting  a  candidate 
feature  from  one  of  the  images  in  the  binocular  pair,  using  the  same  location  quality 
metric  presented  previously  for  the  monocular  imaging  calculation.  This  establishes  a 
candidate  binocular  feature.  Once  a  base  feature  is  chosen,  the  feature  location  of  the 
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binocular  match  is  statistically  projected  into  the  feature  space  of  the  corresponding 
image.  For  example,  given  the  location  of  a  feature  from  image  a,  z a(ti),  a  statistical 
binocular  projection  is  desired  to  predict  the  corresponding  binocular  match  mean 
and  uncertainty  in  the  feature  space  of  image  b.  The  pixel  location  in  the  b  frame  can 
be  described  by 

zb  =  T^'xsCf'  (4.90) 

and 

s“  =  [*C£T£X  +  p2m„  -  PSJ  (4,91) 

where  TTX  and  T('“  are  the  camera  projection  matrices  for  camera  a  and  b,  respec¬ 
tively,  Ccc°  and  are  the  orientation  direction  cosine  matrices  for  camera  a  and 
b,  respectively.  In  addition,  p^mo  and  p^mb  are  the  location  vectors  from  the  Co 
frame  for  camera  a  and  b,  respectively  (see  Figure  4.11).  Finally,  k  is  the  (unknown) 
distance  parameter. 

Variations  in  k  describe  a  portion  of  the  epipolar  line  in  the  secondary  image. 
Depending  on  the  system  conditions,  some  or  all  of  the  parameters  can  be  treated 
as  random  variables.  In  this  example,  the  relative  camera  location  and  orientations, 
distance,  and  pixel  location  errors  are  modeled  as  random  variables  with  Gaussian 
distributions.  While  the  distribution  of  the  distance  parameter,  k,  is  most  accurately 
modeled  as  a  uniform  random  variable,  in  order  to  simplify  the  calculation  of  the 
search  boundary,  a  Gaussian  distribution  is  desired  which  is  representative  of  the 
uniform  random  variable  when  projected  into  the  corresponding  image.  An  acceptable 
Gaussian  distribution  for  k  which  results  in  accurate  predictions  of  the  binocular  pixel 
search  space  is  empirically  determined  as  a  function  of  the  binocular  disparity.  The 
binocular  disparity,  dbinoc,  is  defined  as 

dbinoc  =  II  Pcama  —  PcamJI  (4.92) 
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The  distance  parameter,  k,  is  thus  modeled  as  a  Gaussian  distribution  with  mean  and 
variance  given  by 


k  =  E[k]  =  4.5dbinoc  (4.93) 

4  =  Ep  -  W\  =  9<iL«  (4.94) 


which  is  determined  empirically  to  result  in  a  reasonable  approximation  of  the  binoc¬ 
ular  search  space  when  projected  along  the  epipolar  line  (see  Figure  12(b)). 

The  predicted  pixel  location  mean  is  calculated  using  the  mean  values  of  the 
parameters  (see  Equations  (4.90)  and  (4.91)): 


=  f  [2 
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(4.95) 


Assuming  independent  error  sources,  the  pixel  location  error  covariance  matrix  is 
given  by 


Zb  Zb 


^-ZbZa^'ZaZa^-ZbZa  ^~ZbPa  ^ PaPa  ^^ZbPa  Ot-aOt-a^- ZbOLa 


TT  p  TT/ 

^-ZbPb*  PbPb^-ZbPb 


4“HzbabP  abab  K*b  +  H^Hh  +  ^zbzb 


(4.96) 


where  RZaZa  and  RZbZb  are  the  (independent)  pixel  location  error  covariance  matrices 
for  images  a  and  b ,  respectively.  The  binocular  lever  arm  location  and  orientation 
uncertainty  covariances  are  given  by  PPaPa  and  PQ.(iQa  for  camera  a  and  PPbPb  and  Pabab 
for  camera  b.  The  influence  matrices  are  defined  as  the  partial  derivatives  calculated 


112 


about  the  estimated  states  (x): 
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(4.97) 

(4.98) 

(4.99) 

(4.100) 

(4.101) 

(4.102) 


Once  the  feature  location  has  been  projected  into  the  corresponding  binocular 
feature  space,  the  matching  process  consists  of  determining  the  features  which  are 
within  a  given  statistical  distance  of  the  prediction,  then  choosing  the  feature  which 
minimizes  the  comparative  feature  description  distance.  This  is  equivalent  to  the 
procedure  discussed  in  Section  4.2.3.  An  example  of  the  binocular  matching  process 
is  shown  in  Figures  12(a)  and  12(b). 

Once  a  binocular  pair  of  pixel  locations  corresponding  to  a  landmark  have  been 
determined,  the  next  step  is  to  estimate  the  relative  line  of  sight  vector  from  the 
origin  of  the  binocular  frame  to  the  landmark.  From  Figure  4.11,  the  respective  pixel 
locations  can  be  expressed  as  a  function  of  the  relative  line  of  sight  vector: 


z“  =  h.[8?,CS,p2ma,T£']  (4.103) 

z‘  =  ht[sS>,C2.p2mi.Tf]  (4.104) 

where  s),0  is  the  relative  line  of  sight  vector,  which  can  be  estimated  using  nonlinear 
regression  techniques  such  as  iterative  least  squares  (see  Section  2.9). 
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(a)  Left  image. 


(b)  Right  image. 


Figure  4.12:  Binocular  feature  prediction.  The  location  of  a  selected  feature  (blue  + 
with  red  circle)  is  predicted  from  the  left  image  to  the  right  image  using  a  statistical 
model  of  the  relationship  between  the  two  cameras.  Note  the  2a  bound  agrees  well 
with  the  range  variation  line.  The  range  variation  line  indicates  where  the  predicted 
landmark  will  appear  at  a  distance  from  1  meter  to  1000  meters  (from  left  to  right). 
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Expanding  equations  (4.103)  and  (4.104)  yields 


z 


z 


,a 


,b 


rppiXgCa 
ca  — a 


(4.105) 

(4.106) 


where 


(4.107) 

(4.108) 


The  estimation  procedure  results  in  an  estimated  line  of  sight  vector,  Sq°,  and 
an  associated  covariance  matrix,  PSQS0.  The  landmark  location  in  the  navigation 
frame  can  now  be  estimated  by  substituting  the  line  of  sight  vector  estimate  and  the 
navigation  state  estimate  into  the  landmark  location  equation: 


(4.109) 


Once  the  landmark  location  and  the  associated  covariance  and  cross-correlation  ma¬ 
trices  have  been  calculated,  the  Kalman  filter  states  can  be  augmented  using  the 
procedure  described  in  Section  4. 3. 2. 2. 

This  procedure  has  assumed  the  binocular  regression  algorithm  converges  to 
the  correct  minimum.  When  the  landmark  is  a  large  distance  away,  relative  to  the 
binocular  disparity,  pixel  measurement  noise  can  create  a  situation  where  the  two  line 
of  sight  vectors  are  skewed  outward  and  result  in  negative  (i.e.,  impossible)  distance 
estimates.  This  condition  can  be  easily  detected  prior  to  the  regression  and  another 
binocular  feature  can  be  selected. 


4-  3. 3. 4  Landmark  Location  Estimation  Using  Egomotion  and  Monocular 


Measurements.  In  the  absence  of  terrain  information  or  binocular  images,  inertial 
measurements  can  be  exploited  to  estimate  the  distance  to  landmarks  using  egomo- 
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tion.  In  this  scheme,  the  pixel  location  of  a  selected  landmark  is  measured  over  two 
or  more  images  until  the  distance  becomes  observable.  Once  the  distance  is  observ¬ 
able,  the  feature  is  tracked  (and  used  for  navigation)  by  augmenting  the  Kalman  filter 
state  vector  in  an  identical  fashion  to  that  described  in  Section  4. 3. 2. 2.  The  concept 
of  utilizing  a  series  of  monocular  measurements  with  unknown  terrain  information  is 
not  new  (see  Strelow  [62]),  however,  this  research  improves  upon  Strelow’s  method  by 
leveraging  inertial  measurements  to  statistically  constrain  the  correspondence  search. 

The  proposed  procedure  consists  of  two  steps:  1)  given  a  feature  location  in  an 
image  at  time  tt,  predicting  the  feature  location  in  the  image  at  time  ti+ 1,  and  2) 
estimating  the  landmark  location  from  a  series  of  pixel  measurements.  The  statistical 
feature  prediction  method  is  presented  first. 

Predicting  the  location  of  a  feature  from  one  image  to  another  using  inertial  mea¬ 
surements  is  similar  to  the  binocular  prediction  technique  presented  in  Section  4. 3. 3. 3, 
except  that  the  binocular  disparity  is  estimated  using  the  change  in  navigation  state. 
The  feature  selection  and  matching  process  begins  by  selecting  a  candidate  feature 
from  the  image  at  time  tt ,  using  the  same  location  quality  metric  presented  previ¬ 
ously  for  the  monocular  imaging  calculation.  Once  a  candidate  feature  is  chosen, 
the  feature  location  is  statistically  projected  into  the  feature  space  at  time  ti+ 1.  The 
monocular  egomotion  imaging  geometry  is  shown  in  Figure  4.13.  For  example,  given 
the  location  of  a  feature  from  the  image  at  time  fj,  z(T),  a  statistical  projection  is 
desired  to  predict  the  corresponding  feature  location  in  the  feature  space  at  time  ti+\. 
The  pixel  location  at  time  U+i  can  be  described  by 

z  (ti+1)  =  TfV(ti+1)  (4.110) 


and 

S'ft+i)  =  Cg{Cf  M  [CJ(ti)  (fcCjs'fo)  +  pJJ  +  p”(f.)  -  p”(ii+i)] 


P  camS 

(4.111) 
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Figure  4.13:  Monocular  egomotion  geometry.  The  line  of  sight  vector  from  the 

camera  location  at  time  ti+ 1  is  a  function  of  the  landmark  location  and  the  camera 
egomotion  from  time  t,  to  ti+1. 

where  and  pn(b+i)  are  the  positions  of  the  vehicle  at  times  ti  and  tl+i,  respec¬ 
tively.  In  addition,  the  direction  cosine  matrices  and  C^(ti+i)  are  the  body 

to  navigation  frame  DCMs  at  times  and  tl+\.  The  unknown  distance  parameter  is 
given  by  k.  Finally,  the  homogeneous  landmark  location  vector  at  time  t,  is  a  function 
of  the  pixel  measurement,  z(tt): 


(4.112) 


Variations  in  k  describe  a  portion  of  the  epipolar  line  in  the  second  image.  While 
k  is  most  accurately  described  as  a  uniform  random  variable,  in  order  to  simplify  the 
calculation  of  a  search  boundary,  a  representative  Gaussian  distribution  is  desired. 
From  Equation  (4.111),  the  distance  between  the  camera  location  at  time  tt  and  t,:+i 
is  defined  as 
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In  a  similar  fashion  to  that  presented  in  Section  4. 3. 3. 3,  the  distance  parameter,  k,  is 
modeled  as  a  Gaussian  distribution  with  mean  and  variance  given  by 


k  =  E  [k]  =  4.5  d 
o\  =  E[(fc  -  k )2]  =  9 d2 


(4.114) 

(4.115) 


The  predicted  pixel  location  mean  and  covariance  can  now  be  calculated.  Gen¬ 


eralizing  Equations  (4.110)-(4.112)  results  in  the  nonlinear  measurement  equation: 


where  v(£i+i)  is  additive,  white,  Gaussian  noise.  The  pixel  location  error  at  time  ti+ 1 
is  given  by 


5z(ti+1)  =  H ZXiSx(ti)  +  HZXi+16x(ti+1 )  +  H zk5k  +  H^vfo)  +  HZZi+1v(ti+1 )  (4.117) 


where  fix(t*)  is  the  navigation  state  error  at  time  ,  5x(ti+ 1)  is  the  navigation  state 
error  at  time  U+i,  v(7()  is  the  measurement  noise  at  time  t(,  and  v(t(+1)  is  the  mea¬ 
surement  noise  at  time  ti+ 1.  The  influence  matrices  are  given  by  Hz  and  are  the 
respective  partial  derivatives  of  the  measurement  function. 

Calculation  of  the  pixel  location  error  covariance  requires  knowledge  of  the  state 
transition  matrix  from  time  tj  to  time  U+ 1,  and  the  additive  process  noise 

matrix,  Q^,  which  are  available  from  the  Kalman  filter  propagation  equations  (see 
Section  2.8).  The  navigation  error  covariance  can  now  be  expressed  as  a  function  of 
the  state  transition  matrix  and  process  noise: 


(4.118) 
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The  pixel  location  uncertainty  at  time  ti+1  is  defined  as 


~Pzz{ti+i)  =  E  [ Sz(ti+1)5z(ti+1)T ] 


(4.119) 


Substituting  Equations  (4.117)-(4.118)  into  (4.119),  and  noting  the  independence  of 
the  pixel  measurement  noise  and  initial  distance  estimate  yields  the  pixel  location 
uncertainty: 


+  H22iR22(tj)Hjz,  +  R22(fj+i) 


(4.120) 


Now  that  the  predicted  pixel  location  mean  and  covariance  are  determined,  the  cor¬ 
respondence  search  for  the  feature  of  interest  now  proceeds  in  an  identical  fashion  to 


the  binocular  method.  This  search  procedure  can  be  accomplished  for  each  candi¬ 


date  feature  in  an  image,  and  results  in  a  list  of  correspondences  between  features  in 
multiple  images. 

Given  a  successful  correspondence  between  a  single  feature  or  a  collection  of 
features,  the  measurements  can  be  used  to  estimate  the  relative  location  of  each 
landmark.  Once  a  landmarks  has  an  initial  position  estimate,  the  navigation  state 
vector  is  augmented  and  subsequent  measurements  are  used  to  refine  the  landmark 
location  and  estimate  the  errors  in  the  navigation  state.  A  complete  method  to 
accomplish  this  task  is  presented  by  Strelow  [62], 

This  approach  has  some  limitations  which  warrant  discussion.  The  first  issue 
is  the  potential  lack  of  distance  observability  when  the  camera  translation  distance 
is  small  relative  to  the  landmark  distance.  This  is  equivalent  to  the  issues  encoun¬ 
tered  when  attempting  to  estimate  the  location  of  distant  landmarks  in  a  binocular 
scenario  (see  Section  4. 3. 3. 3).  In  addition,  even  when  sufficient  egomotion  exists  in 
order  to  observe  the  landmark  position,  unconstrained  optical  methods  suffer  from 
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the  so-called  scale  ambiguity  [36] .  This  scale  ambiguity  limits  the  observability  of  an 
image  correspondence  measurement  to  relative  angles  and  a  direction  of  translation 
vector.  The  scale  of  the  translation  is  not  observable.  While  this  is  not  a  catastrophic 
condition,  any  errors  in  the  estimated  velocity  state  result  in  a  distance  bias,  which 
causes  an  un-modeled  accumulation  of  errors  in  the  extended  Kalman  filter.  In  order 
to  compensate  for  this  effect,  the  process  noise  must  be  increased  as  discussed  in  [38]. 
An  example  of  this  effect  is  demonstrated  via  simulation  in  Chapter  VI. 

4-4  Absolute  Measurements 

In  the  previous  section,  a  theory  for  exploiting  the  apparent  location  of  land¬ 
marks  in  multiple  images  was  presented,  where  incomplete  information  regarding  the 
landmark  location  was  available.  This  is  a  form  of  relative  navigation,  because  the 
estimated  location  of  the  landmark  (and  navigation  state)  are  all  relative  to  previous 
estimates  of  the  navigation  state,  and,  by  definition,  multiple  measurements  in  time 
are  required  to  extract  navigation  information. 

An  absolute  measurement  can  be  exploited  when  complete  position  information 
is  available  for  a  given  landmark.  In  contrast  to  relative  measurements,  absolute 
measurements  do  not  suffer  from  long-term  drift  due  to  integration  of  errors.  In 
addition,  since  absolute  measurements  do  not  require  estimation  of  the  landmark 
location,  no  additional  filter  states  are  required.  In  this  section,  three  types  of  absolute 
landmark  measurements  are  presented  and  analyzed. 

4-4-1  Reference  Landmark.  A  reference  landmark  requires  both  the  prior 
knowledge  of  the  physical  location  relative  to  the  navigation  frame  and  the  feature 
descriptor  vector.  Performing  a  reference  landmark  measurement  requires  three  steps: 
prediction  of  the  landmark  location  in  feature  pose  space,  performing  a  statistically- 
constrained  correspondence  search  for  the  feature  of  interest  in  the  current  measured 
feature  vectors,  and  finally  incorporating  the  measured  location  of  the  reference  land¬ 
mark  into  the  filter. 
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Given  a  known  reference  landmark  with  position,  yn,  and  uncertainty,  Pyy,  the 
pixel  location  in  the  current  image  is  a  nonlinear  function  of  the  navigation  state  and 
the  landmark  location: 

z  =  Tfxsc  (4.121) 

where 

=  C£  [C*  (y"  -  p”)  -  pL„]  (4.122) 

The  pixel  location  uncertainty,  Pzz,  is  defined  as 

Pzz  =  E  [SzSzT]  (4.123) 

where 

Sz  =  Hza;(5x  +  Hzy5y  +  v  (4.124) 

The  influence  matrices,  Hzx  and  HZJ/  are  the  partial  derivatives  of  the  measurement 
equation  with  respect  to  the  navigation  state  and  landmark  errors,  respectively.  The 
pixel  measurement  errors  are  represented  by  v.  Substituting  Equation  (4.124)  into 
(4.123),  and  noting  the  independence  between  the  navigation,  landmark,  and  pixel 
measurement  errors  results  in  the  following  expression  for  the  pixel  location  uncer¬ 
tainty  covariance 

Pzz  =  H2a;P  xxPi^x  +  nzyP  yyp^y  +  Rzz  (4.125) 

The  main  difference  between  the  statistics  of  relative  and  absolute  landmarks  is  the 
independence  of  the  landmark  position  and  navigation  state  errors. 

Once  the  pixel  location  and  associated  covariance  matrix  have  been  calculated, 
the  next  step  is  to  determine  the  potential  correspondence  between  the  features  mea¬ 
sured  from  the  current  image.  This  procedure  is  identical  to  that  presented  in  Sec¬ 
tion  4.2.3,  where  the  feature  distance  is  calculated  for  all  features  within  a  search 
region  defined  by  statistics  of  the  predicted  pixel  location. 
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Finally,  if  there  is  a  successful  match,  the  measured  pixel  location  errors  are 
used  to  update  the  navigation  states.  The  rightmost  two  terms  in  Equation  (4.124) 
are  combined  to  produce  the  measurement  equation: 


(4.126) 


Sz  =  H  zxSx  +  v* 


where 


(4.127) 

(4.128) 


4-4- 2  Revisitation  of  Previously  Tracked  Landmark.  When  navigating  us¬ 
ing  landmarks  of  opportunity  (i.e.,  no  or  incomplete  prior  position  information),  the 
methods  presented  in  this  chapter  require  the  Kalman  filter  to  maintain  an  estimate 
of  the  three-dimensional  location  of  the  landmark.  By  maintaining  a  database  of 
these  landmarks,  their  feature  descriptors,  and  position  estimates,  this  information 
can  be  leveraged  when  revisiting  these  landmarks  and  resulting  in  absolute  position 
measurements.  In  other  words,  these  methods  allow  a  properly  equipped  sensor  to 
“learn”  it’s  environment  and  exploit  this  information  to  eliminate  a  major  source  of 
error.  This  concept  is  demonstrated  using  both  simulation  and  experimental  methods 
in  Chapter  VI. 

The  development  of  the  measurement  equations  for  a  previously  estimated  land¬ 
mark  is  identical  to  the  previous  section,  with  one  consideration.  The  pixel  location 
uncertainty  covariance  shown  in  Equation  (4.125),  assumes  that  the  navigation  state 
errors  and  the  landmark  location  errors  are  independent.  This  is  not  always  true 
when  revisiting  a  previously  tracked  landmark  as  the  cross-correlation  matrix  is  non¬ 
zero.  Fortunately,  the  errors  decorrelate  over  time  due  to  the  integration  of  noise 
sources  within  the  filter.  Thus,  care  must  be  taken  to  ensure  revisited  landmarks  are 
sufficiently  decorrelated  before  using  them  for  reference  landmark  updates. 
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4-4-3  Celestial  Measurements.  As  mentioned  in  Section  3.1.1,  celestial  bod¬ 
ies  represent  a  very  high-quality  source  of  optical  navigation  measurements.  The  key 
difference  between  celestial  measurements  and  landmark  measurements  is  the  great 
distance  to  celestial  bodies  relative  to  the  size  of  the  Earth. 

The  celestial  measurement  is  simply  the  current  pixel  location  of  an  identified 
celestial  object.  Given  a  celestial  object  n,  located  at  a  known  direction  relative  to  the 
Earth  (based  on  astronomical  almanac  data)  [25],  is  represented  by  the  unit  vector, 
s®  (tj),  at  time  tt  [16].  The  object  direction  unit  vector  in  the  camera  frame,  s(j(U),  is  a 
function  of  the  navigation  frame  orientation,  vehicle  orientation,  and  camera-to-body 
orientation: 


(4.129) 


where  C"  is  the  Earth  frame  to  navigation  frame  direction  cosine  matrix.  Using 
the  camera  model,  the  undistorted  pixel  location,  z(U),  is  a  function  of  the  camera 
projection  matrix,  T(”'x,  and  the  homogeneous  form  of  the  object  direction  unit  vector: 


Z(t4)  =  (U)  +  v(ii) 


(4.130) 


where  v(fj)  is  additive  white  Gaussian  noise. 

By  employing  the  techniques  presented  in  Section  4.3.3,  the  influence  matrix  is 
calculated  in  a  straightforward  manner.  The  influence  matrix  corresponding  to  the 
attitude  uncertainty  is  defined  as 


(4.131) 


Evaluating  the  partial  derivative  yields 


where 


dsc 

dil) 


-c£c‘ (c;rx) 


(4.133) 


The  influence  matrix  corresponding  to  the  camera  misalignment  is  defined  as 


H, 


dz 

da 


Evaluating  the  partial  derivative  yields 


xcds^  _xc  r<9sH 

_  rr\pix  z  da  l  da  J  ; 

za  —  J-c  /v  x2 


where 


f)  cc 

^  =  (cjcjcy) 


X 


All  other  influence  matrices  are  zero. 


(4.134) 


(4.135) 


(4.136) 


In  the  next  chapter,  a  practical  image  and  inertial  navigation  sensor  is  designed 
and  calibrated  using  the  algorithms  presented  in  the  previous  sections. 
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V.  Experimental  Navigation  System  Design  and  Calibration 


Due  to  the  complex  nature  of  the  image- aided  inertial  navigation  problem,  exper¬ 
imental  data  collections  are  critical  in  order  to  demonstrate  the  performance 
of  the  proposed  navigation  algorithm.  In  this  chapter,  the  design  and  calibration  of 
the  experimental  navigation  system  is  presented.  The  chapter  begins  by  outlining 
an  overall  system  architecture,  then  describes  the  sensors  and  their  respective  data 
products.  Finally,  the  calibration  and  tuning  methods  are  presented. 

5.1  Experimental  Image- Aided  Navigation  System  Overview 

The  image-aided  navigation  system  used  in  the  research  consists  of  a  navigation 
computer,  inertial  measurement  unit,  and  a  pair  of  digital  cameras.  As  shown  in  Fig¬ 
ure  5.1,  the  system  is  configured  in  two  ways,  depending  on  the  conditions  of  the  data 
collection.  In  the  ground  system,  the  cameras  are  mounted  level  to  the  sensor  plate 
with  approximately  22  cm  of  binocular  disparity.  In  the  flight  system,  the  cameras  are 
mounted  on  a  rigid  bracket  which  is  pointed  downward  by  approximately  45  degrees 
in  order  to  maintain  a  view  of  the  ground.  The  cameras  are  spaced  approximately 
44  cm  apart.  The  larger  binocular  disparity  increases  the  effective  range  for  reliably 
determining  the  landmark  distances  using  binocular  stereopsis. 

The  sensors  are  connected  to  a  navigation  computer  using  various  protocols.  A 
discrete  bus  is  used  to  synchronize  the  sensors  with  the  master  clock.  A  block  diagram 
of  the  system  is  shown  in  Figure  5.2.  The  details  of  the  system  components  are  given 
in  the  following  sections. 

5.1.1  Sensors.  As  shown  in  the  block  diagram,  the  sensor  test  platform 
consists  of  digital  cameras  and  inertial  measurement  units.  The  digital  cameras 
are  both  PixeLINK  PL-A741  machine  vision  cameras.  The  PL-A741  camera  uses 
a  monochrome  complementary  metal-oxide  semiconductor  (CMOS)  imaging  sensor 
with  1280x1024  pixel  (1.3  megapixel)  resolution.  The  pixels  are  6.7  ym  square  which 
results  in  a  sensor  size  of  8.576  mm  x  6.921  mm.  The  PL-A741  camera  accepts  stan- 
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(a) 


(b) 


Figure  5.1:  Image-aided  inertial  navigation  system.  The  navigation  system  consists 
of  a  tactical-grade  IMU,  consumer-grade  IMU,  and  two  monochrome  digital  cam¬ 
eras  mounted  to  a  rigid  plate.  Panel  (a)  shows  the  system  used  for  ground  testing. 
Panel  (b)  shows  the  system  as  installed  for  flight  testing. 
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Figure  5.2:  Navigation  system  block  diagram.  The  image-aided  inertial  navigation 
system  consists  of  tactical  and  consumer  grade  IMUs,  two  digital  cameras,  and  a 
navigation  computer.  The  sensors  are  synchronized  using  a  discrete  bus. 


127 


dard  C-mount  lenses.  The  camera  communicates  with  the  navigation  computer  using 
an  IEEE-1394a  (FireWire)  interface,  which  is  capable  of  transmitting  approximately 
3  frames  per  second  at  full  resolution.  The  PL-A741  includes  an  external  interface 
which  can  be  used  to  trigger  synchronized  frame  captures.  Finally,  the  camera  in¬ 
cludes  a  global  shutter  which  exposes  all  of  the  pixels  simultaneously.  This  avoids 
one  of  the  primary  limitations  of  standard,  line-scanned  video  sources.  The  complete 
PL-A741  specifications  are  listed  in  [49]. 

The  cameras  are  paired  with  Pentax/Cosmicar  C30405  C-mount  lenses.  These 
lenses  feature  a  manual  iris  which  ranges  from  full  closed  to  //1.8.  The  4.8  mm  focal 
length  results  in  a  horizontal  field  of  view  of  approximately  45  degrees. 

In  order  to  test  the  performance  of  the  system  with  variations  in  inertial  sensors, 
both  tactical  and  consumer-grade  IMUs  are  mounted  to  the  platform.  The  system 
is  configured  to  record  measurements  from  both  sensors  simultaneously  so  that  the 
same  navigation  profile  can  be  processed  using  each  of  the  sensors  separately. 

The  Honeywell  HG1700  is  a  tactical- grade,  strapdown  inertial  measurement 
unit.  The  unit  consists  of  a  GG1308  ring-laser  gyroscope  and  triad  of  RBA-500  ac¬ 
celerometers  which  produce  measurements  at  100  Hz.  The  gyroscopes  and  accelerom¬ 
eters  have  a  dynamic  range  of  ±  1000  degrees  per  second  and  ±  50  g,  respectively. 
Selected  specifications  for  the  HG1700  are  shown  in  Table  5.1. 

The  Crista  IMU  is  a  consumer-grade,  strapdown  inertial  measurement  unit. 
The  IMU  consists  of  micro-electrical  mechanical  systems  (MEMS)  gyroscopes  and  ac¬ 
celerometers  which  produce  measurements  at  100  Hz.  The  gyroscopes  and  accelerom¬ 
eters  have  a  dynamic  range  of  ±  300  degrees  per  second  and  ±  lOg,  respectively. 
Selected  specifications  for  the  Crista  IMU  are  shown  in  Table  5.1. 

5.2  Calibration  and  Alignment 

The  ultimate  goal  of  the  calibration  and  alignment  process  is  to  determine  the 
relative  orientation  between  an  imaging  and  inertial  sensor,  or  more  specifically,  the 
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Table  5.1:  Inertial  measurement  sensor  specifications  for  the  Cloud  Cap  Technol¬ 
ogy  Crista  consumer-grade  IMU  [6]  and  Honeywell  HG1700  tactical-grade  IMU  [22], 
The  parameters  noted  with  an  asterisk  are  not  included  in  the  specifications  and  are 
estimates. 


Parameter  (Units) 

Crista  IMU 

HG1700 

Sampling  interval  (ms) 

5.0 

10.0 

Gyro  bias  sigma  (deg /hr) 

1800 

1.0 

Gyro  bias  time  constant  (hr) 

2* 

2* 

Angular  random  walk  (deg/Vhr) 

2.23 

0.3 

Gyro  scalefactor  sigma  (PPM) 

10000 

150 

Accel  bias  sigma  (m/s2) 

0.196 

0.0098 

Accel  bias  time  constant  (hr) 

2* 

2* 

Velocity  random  walk  (m/ s/Vhr) 

0.261 

0.57* 

Accel  scalefactor  sigma  (PPM) 

10000 

300 

sensitive  axes  of  both  sensors.  The  process  can  be  divided  into  two  main  steps.  The 
first  step  consists  of  calibrating  the  optical  sensors  with  respect  to  unknown  optical 
properties.  The  final  step  is  to  determine  the  relative  location  and  orientation  between 
the  sensitive  axes  of  the  imaging  and  inertial  sensors.  Both  steps  are  presented  in 
detail  in  the  following  sections  and  address  some  limitations  found  in  the  current 
literature. 

5.2.1  Camera  Calibration.  The  camera  and  lens  system  must  be  properly 
calibrated  in  order  to  accurately  map  a  pixel  location  to  a  fine  of  sight  vector,  which  is 
critical  to  the  performance  of  the  system.  As  presented  in  Sections  2.7.3  and  2.7.2,  the 
camera/lens  model  consists  of  both  a  projection  model  based  on  a  pinhole  camera  with 
an  associated  nonlinear  correction  to  remove  the  effects  of  radial  optical  distortion 
caused  by  the  lens.  The  calibration  parameters  of  interest  are 


•  Horizonal  pixel  size  (sx) 

•  Vertical  pixel  size  (sy) 

•  Effective  focal  length  (/) 

•  Nonlinear  optical  distortion  parameters  (a2,  04) 
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Figure  5.3:  Sample  image  of  optical  calibration  array.  The  calibration  array  consists 
of  a  48  x  96  inch  board  with  reference  holes  drilled  at  one  inch  intervals.  The  effects 
of  the  optical  distortion  which  cause  straight  lines  to  appear  curved  are  visible  in  the 
image. 

•  Center  of  optical  distortion  (c) 

In  this  experiment,  the  specified  values  of  the  horizontal  and  vertical  pixel  size  pa¬ 
rameters  are  assumed  to  be  accurate  and  are  not  explicitly  estimated.  This  is  justified 
due  to  the  precise  nature  of  the  photolithographic  process  used  to  create  the  CMOS 
array. 

The  calibration  is  accomplished  by  imaging  a  series  of  surveyed  reference  points 
and  measuring  the  resulting  pixel  location  of  each  point.  In  order  to  maintain  observ¬ 
ability  of  all  parameters,  images  must  be  taken  at  varying  distances.  The  reference 
consists  of  a  flat  ,4x8  foot  board  with  calibration  holes  drilled  at  one  inch  intervals.  A 
sample  image  of  the  calibration  array  is  shown  in  Figure  5.3.  Note  the  effects  of  radial 
distortion  in  the  image,  which  cause  straight  lines  to  appear  curved  in  the  image. 

In  order  to  predict  the  pixel  location  of  each  calibration  point  in  the  image, 
the  relative  pose  of  the  camera  is  required.  Unfortunately,  the  true  pose  of  a  camera 
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is  difficult  to  independently  measure  to  the  relative  precision  required  to  properly 
observe  the  camera  calibration  parameters.  Thus  a  method  was  used  to  estimate  the 
pose  and  calibration  parameters  simultaneously.  In  estimation  terminology,  the  pose 
is  considered  a  nuisance  parameter  which  is  required  in  order  to  estimate  the  true 
parameters  of  interest. 

Because  the  cameras  are  arranged  for  binocular  imaging,  estimation  of  the  cali¬ 
bration  parameters  is  accomplished  concurrently.  Since  the  relative  pose  between  the 
cameras  is  constant  and  partially  known,  this  imposes  an  additional  constraint  on  the 
estimation  process.  Incorporation  of  the  additional  parameters  required  for  binocular 
estimation  results  in  the  following  parameter  list. 

•  Effective  focal  length  for  camera  a  (/„) 

•  Effective  focal  length  for  camera  b  (/*,) 

•  Nonlinear  optical  distortion  parameters  for  camera  a  (aa J 

•  Nonlinear  optical  distortion  parameters  for  camera  b  (a^) 

•  Center  of  optical  distortion  for  camera  a  (ca) 

•  Center  of  optical  distortion  for  camera  b  (c&) 

•  Location  of  camera  a  with  respect  to  the  binocular  reference  frame  (p 

•  Location  of  camera  b  with  respect  to  the  binocular  reference  frame  J 

•  Binocular  reference  frame  to  camera  a  frame  DCM  (C£“) 

•  Binocular  reference  frame  to  camera  b  frame  DCM  (C^) 

•  Location  of  binocular  reference  in  navigation  frame  for  image  j  (p”) 

•  Navigation  to  binocular  reference  frame  DCM  for  image  j  (C£° ) 

The  binocular  measurement  geometry  is  shown  in  Figure  4.11. 

The  resulting  nonlinear  regression  function  for  camera  a,  image  j,  is  given  by 


hi0(/«,c0,p”,C«.CS)-h2o(c 


-'at  ®a2 )  ®a4 )  Za7- 


=  0 


(5.1) 
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and  similarly  for  camera  b,  image  j 


hib(fb,  Cb,  p“,  C? ,  Cg)  -  h2l,(ct,  afa ,  at4 ,  z ?  )  =  0 


(5.2) 


where  zf/  represents  the  pixel  location  of  the  calibration  point  prior  to  removal  of  the 
optical  distortion. 


The  hi  function  represents  the  projection  of  the  calibration  point,  yn,  into  the 
image  plane  using  the  pinhole  camera  model.  This  function  is  defined  for  camera  a 
as 

sc“  (5. 

2x3 

where 


hla  = 


-fa/Sa 


0  fa/Sy  I 


^2x1 


-,Ca  — 


eg  [eg  (y»  -  p”)  -  pgmJ 


(5.4) 


The  h2  function  represents  the  effects  of  optical  distortion,  which  is  defined  in  Equa¬ 
tion  (2.91),  and  repeated  here  for  clarity  for  camera  a 


h 2a  =  (z a  -  ca)  (l  +  aa2r 2  +  aa4r4)  +  ca 


(5.5) 


where 

r  =  \\zda  -  ca||  (5.6) 

Thus,  a  nonlinear  regression  can  be  performed  by  matching  the  true  location  of  each 
calibration  point  with  its  projected  location  in  each  image. 

When  performing  this  calibration,  care  must  be  taken  to  promote  observability 
in  several  areas.  The  first  area  of  potential  unobservability  occurs  when  estimating 
the  nonlinear  optical  distortion  parameters,  since  both  are  functions  of  pixel  radius 
(one  in  r2  and  one  in  r4).  This  issue  is  mitigated  by  ensuring  the  calibration  points  are 
located  at  a  wide  variety  of  radial  distances  from  the  center  of  the  image.  The  second 
area  of  weak  observability  occurs  when  estimating  the  camera  orientation  and  location 
relative  to  the  binocular  reference  frame.  Assuming  the  unknown  orientation  of  the 
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cameras  relative  to  the  binocular  reference  frame  is  small,  the  relative  orientation 
between  the  cameras  is  estimated,  instead  of  each  orientation  separately.  In  addition, 
since  the  relative  location  of  the  cameras  is  well  known,  an  estimate  of  the  value 
is  not  needed  and  the  parameters  are  removed  from  the  regression.  Finally,  there 
is  unobservability  between  the  distance  component  of  the  position  vector  and  the 
estimated  focal  length.  This  is  mitigated  by  accurately  measuring  the  distance  to 
the  calibration  array  for  each  image  and  effectively  removing  this  variable  from  the 
regression. 

The  binocular  calibration  is  accomplished  at  three  measured  distances  from  the 
calibration  array.  A  sample  image  which  shows  the  prediction  errors  before  and  after 
the  calibration  process  is  shown  in  Figures  4(a)  and  4(b).  A  total  of  10397  measure¬ 
ments  of  calibrated  locations  are  used  to  estimate  the  intrinsic  binocular  parameters. 
The  resulting  mean  of  the  squared  pixel  matching  errors  was  0.84,  which  agreed  with 
the  expected  pixel  measurement  uncertainty  due  to  the  correlation  process. 

Once  the  binocular  calibration  has  been  completed,  the  next  step  is  to  esti¬ 
mate  the  relative  location  and  orientation  between  the  binocular  rig  and  the  inertial 
measurement  unit.  The  procedure  used  to  accomplish  this  is  presented  in  the  next 
section. 

5.2.2  Sensor  Alignment.  In  order  to  accurately  combine  the  measurements 
from  optical  and  inertial  sensors,  the  relative  position  and  orientation  between  the 
sensors  must  be  known.  More  precisely,  the  relative  position  and  orientation  must 
be  determined  between  the  sensitive  axes  of  the  respective  sensors.  In  this  section, 
a  method  is  proposed  which  estimates  these  parameters  by  combining  independent 
truth  information  with  imaging  and  inertial  sensor  measurements  of  features  with 
known  relative  locations. 

The  method  consists  of  two  steps,  which  may  be  treated  independently.  The  first 
task  is  to  estimate  the  location  and  orientation  of  the  IMU  center  by  combining  IMU 
measurements  with  a  position  and  orientation  source  of  appropriate  accuracy  (e.g., 
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(a)  Before  calibration. 
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(b)  After  calibration. 


Figure  5.4:  Binocular  camera  calibration.  This  pair  of  sample  images  illustrates 

the  improvements  achieved  during  the  camera  calibration.  The  magenta  plus  symbols 
indicate  the  predicted  locations  of  the  calibration  points  and  the  blue  crosses  indicate 
the  actual  locations  of  the  calibration  points  in  the  image. 


134 


GPS  or  high-quality  stationary  alignment  [54]).  In  this  experiment,  differentially- 
corrected  GPS  measurements  are  optimally  combined  with  the  inertial  measurements 
in  order  to  produce  an  accurate  position  and  orientation  estimate  (with  associated 
uncertainty  covariance)  at  times  corresponding  with  image  capture  events. 

The  second  task  is  to  measure  the  pixel  locations  of  features  with  known  physical 
location  relative  to  the  navigation  frame  from  a  single  or  multiple  images.  This 
relationship  is  governed  by  the  pixel  measurement  equation: 

z  =  h(CS,Cf  ,C*,y",p”,pS,pSj  (5.7) 

where  the  body  to  binocular  frame  DCM,  C[]° ,  and  the  relative  location  vector  be¬ 
tween  the  body  and  binocular  frame,  Pq,  are  the  object  parameters  of  the  nonlinear 
regression.  More  explicitly,  the  pixel  measurement  equation  is  given  by 

z  =  T£fsc“  (5.8) 

where 

S*  =  CS(C?  [C‘  (y»  -  p»)  -  pS]  -  p£m)  (5.9) 

As  discussed  in  the  previous  section,  there  are  observability  issues  which  must 
be  understood  in  order  to  ensure  an  accurate  estimate.  The  first  issue  is  the  limited 
visibility  of  the  relative  location  vector.  This  issue  is  mitigated  by  measuring  the 
relative  location  between  the  two  sensors  and  leaving  the  relative  location  out  of  the 
estimation  process.  The  next  issue  regarding  observability  is  related  to  the  funda¬ 
mental  measurement  accuracy  of  the  sensors,  primarily  the  measurement  accuracy 
of  the  gyroscope  and  the  camera  angular  accuracy.  This  issue  is  investigated  using 
simulations  of  inertial  sensors  with  varying  accuracy. 

A  qualitative  estimate  of  achievable  camera  to  body  alignment  accuracy  for 
various  inertial  sensor  models  as  a  function  of  gyroscope  random  walk  is  shown  in 
Figure  5.5.  As  expected,  improving  the  quality  of  the  inertial  sensor  yields  improved 
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Table  5.2:  Camera  to  inertial  alignment  accuracy  accelerometer  /  gyroscope  per¬ 

formance  sensitivity.  Theoretical  alignment  performance  (in  1  —  a  RMS  arcseconds) 
is  shown  as  a  function  of  various  combinations  of  sensors  from  the  Crista,  HG1700, 
and  HG9900  IMUs  after  completing  a  multi-angle,  stationary  alignment  profile. 


HG9900  Gyro 

HG1700  Gyro 

Crista  Gyro 

HG9900  Accel 

11.1” 

22.7” 

41.3” 

HG1700  Accel 

11.6” 

45.4” 

-4 
i— * 

oo 

Crista  Accel 

12.2” 

68.1” 

291.2” 

alignment,  up  to  the  accuracy  of  the  image  sensor  to  resolve  the  feature  locations. 
In  this  case,  the  camera  alignment  estimate  for  the  navigation  grade  IMU  (HG9900) 
is  limited  by  the  image  measurements.  This  could  be  improved  by  increasing  the 
angular  sampling  frequency  of  the  imaging  sensor  [15].  A  sensitivity  analysis  is  per¬ 
formed  by  creating  theoretical  combinations  of  accelerometer  and  gyroscopic  sensors 
and  estimating  the  boresight  alignment  accuracy  using  the  standard  alignment  pro¬ 
file.  The  results,  shown  in  Table  (5.2),  indicate  the  performance  of  the  gyroscope 
has  a  stronger  contribution  to  the  achievable  boresight  accuracy,  although  both  sen¬ 
sors  influence  the  ultimate  performance  due  to  coupling  between  accelerometer  and 
gyroscopic  error  states. 

These  results  illustrate  the  relationship  between  the  accuracy  of  the  optical  and 
inertial  sensors  and  the  ability  to  determine  the  relative  orientation  of  the  sensors. 
More  fundamentally,  even  if  a  mechanical  or  other  method  existed  to  provide  a  “bet¬ 
ter”  alignment  (e.g.,  relative  to  the  “case”),  it  would  be  impossible  to  prove  as  the 
sensor  measurements  limit  the  observability  of  the  relative  alignment. 

In  the  next  chapter,  the  mathematical  model  presented  in  Chapter  IV  is  vali¬ 
dated  using  simulation  and  experimental  methods. 
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Figure  5.5:  Typical  camera-to-inertial  alignment  accuracy  for  various  grades  of 

IMU.  Specific  results  are  shown  for  the  Crista,  HG1700,  and  HG9900  IMUs  after 
completing  a  multi-angle,  stationary  alignment  profile. 
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VI.  Simulation  and  Experimental  Results 

In  order  to  validate  the  mathematical  development  presented  in  Chapter  IV,  the 
algorithms  are  evaluated  using  both  simulation  and  experimental  methods.  An  in¬ 
cremental  testing  philosophy  is  used  to  validate  the  accuracy  of  the  stochastic  feature 
projection  algorithms  prior  to  evaluating  the  system  navigation  performance.  This 
chapter  is  arranged  as  follows.  First,  the  stochastic  projection  algorithms  are  tested 
using  a  Monte  Carlo  simulation  and  demonstrated  with  flight  test  data.  Next,  the 
image  and  inertial  fusion  algorithm  is  tested  using  a  ground  simulation  and  verified 
with  a  ground  test.  Finally,  the  image  and  inertial  fusion  algorithm  is  tested  in  a 
simulated  flight  environment. 

6.1  Stochastic  Projection  Algorithm  Tests 

This  experiment  validates  the  stochastic  projection  method  using  both  simu¬ 
lated  and  real  data  collected  from  an  airborne  system.  A  Monte  Carlo  simulation  of 
the  test  flight  is  performed  to  validate  the  stochastic  projection  model  with  respect  to 
a  statistically  significant  sampling  of  random  error  contributors.  The  flight  test  data 
are  used  to  validate  the  performance  of  the  algorithm  in  a  real-world  environment.  In 
the  next  section,  the  simulation  parameters  and  results  are  presented. 

6.1.1  Monte  Carlo  Simulation.  The  performance  of  the  stochastic  projec¬ 
tion  method  presented  in  Section  4.2.2  is  verified  using  a  statistically  representative 
ensemble  of  sample  functions  (300  per  run).  The  data  collection  system  used  on  the 
experimental  test  flights  is  simulated  in  software,  based  on  a  reference  trajectory  cho¬ 
sen  to  generate  an  interesting  observation  geometry.  This  constant-altitude  circular 
flight  path  is  constructed  such  that  a  fixed  terrain  patch  remained  in  the  camera  field 
of  view  throughout  the  flight.  The  simulated  aircraft  speed  is  150  meters-per-second, 
altitude  is  2296  meters,  and  bank  angle  is  27  degrees  which  describes  a  circular  flight 
path  with  4592  meter  radius.  The  resulting  slant  range  to  the  landmark  is  5134  me¬ 
ters. 
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The  Profgen  trajectory  generation  software  tool  [40]  is  used  to  create  the  refer¬ 
ence  trajectory,  and  the  resulting  angular  rate  and  specific  force  profiles.  Simulated 
inertial  measurements  are  generated  by  subjecting  the  true  angular  rate  and  specific 
force  profiles  to  randomly  generated  measurement  errors,  with  error  statistics  based 
on  the  respective  inertial  sensor  model  shown  in  Table  5.1. 

The  terrain  elevation  is  simulated  as  a  zero-mean,  Gaussian  random  variable. 
Simulations  are  accomplished  using  a  terrain  elevation  error  standard  deviation  of 
25  meters,  representing  a  moderate  accuracy  terrain  model.  While  sample  images 
are  not  created,  simulated  landmarks  are  randomly  dispersed  over  the  terrain,  with 
randomized  elevations  consistent  with  the  terrain  elevation  error  standard  deviation. 
All  simulations  use  a  10-second  interval  between  two  images,  which  is  equivalent  to 
18.7  degrees  of  arc  in  the  horizontal  plane.  The  simulation  geometry  is  shown  in 
Fig.  6.1. 

The  results  are  shown  in  Fig.  6.2.  In  this  figure,  the  predicted  pixel  location 
errors  for  each  Monte  Carlo  sample  function  are  represented  by  a  “plus”  symbol.  The 
predicted  2-cr  pixel  location  error  bound  is  indicated  by  a  line.  Note  the  inclined 
elliptical  nature  of  the  2 -a  bound  is  a  function  of  the  trajectory  and  measurement 
geometry. 

The  same  predicted  pixel  location  errors  are  shown  referenced  to  a  256x256  pixel 
image  in  Fig.  6.3.  The  stochastic  constraint  method  shows  a  small  correspondence 
search  area  which  gives  the  highest  probability  of  the  landmark  location.  In  this  case, 
the  stochastic  constraint  method  is  an  improvement  over  the  epipolar  line  search 
method  [62]  as  it  provides  a  smaller  search  area  developed  using  a  statistical  model. 
By  observing  the  resulting  elliptical  geometry  of  the  representative  search  space,  it 
can  be  seen  that  the  statistical  search  space  will  not  be  larger  than  an  equivalently 
width-constrained  epipolar  search  space.  In  general  practice,  the  search  space  will  be 
smaller,  especially  when  additional  feature  locator  constraints  are  enforced  (e.g.,  scale 
and  rotation). 
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Figure  6.1:  Simulated  flight  path.  In  order  to  generate  a  good  observation  geometry, 
the  circular  orbit  is  chosen  such  that  a  fixed  terrain  patch  remains  in  the  camera  field 
of  view  throughout  the  flight. 
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Figure  6.2:  Landmark  pixel  location  error  and  predicted  2-cr  bound  for  25  meter 

terrain  elevation  uncertainty.  Note  the  actual  pixel  location  errors  are  similar  to  the 
predicted  error  bound.  Note:  X  and  Y  axes  have  differing  scales  to  show  detail. 


Figure  6.3:  Landmark  pixel  location  error  and  predicted  2-cr  bound  for  25  meter 

terrain  elevation  uncertainty  referenced  to  a  256x256  pixel  image.  Note  the  stochastic 
constraint  can  limit  the  correspondence  search  area  significantly  compared  to  a  search 
near  the  epipolar  line. 
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Figure  6.4:  Northrop  T-38  instrumented  with  synchronized  digital  video  camera 

and  inertial  navigation  system. 

6.1.2  Flight  Test  Experiment.  In  this  section,  the  stochastic  projection 
method  is  implemented  using  image  and  inertial  flight  data  collected  on  a  Northrop 
T-38  “Talon”  aircraft.  The  aircraft  was  equipped  with  a  day-night  monochrome  digital 
video  camera  synchronized  to  a  Honeywell  H-764G  Inertial  Navigation  System.  The 
camera  was  mounted  in  the  cockpit,  pointing  out  the  right  wing.  Flight  data  were 
collected  in  Fall  of  2002  at  Edwards  Air  Force  Base,  California. 

The  aircraft  state  dynamics  are  a  function  of  the  measurements  from  the  strap- 
down  inertial  sensors.  All  states  are  estimated  in  the  Earth-centered  Earth-fixed  refer¬ 
ence  frame  previously  defined.  The  error  equations  were  developed  based  on  [64,71]. 
For  this  example,  a  three  image  sequence  from  a  right  turning  profile  is  shown  in 
Fig.  6.5.  The  results  of  the  above  method  for  predicting  the  future  target  location 
and  uncertainty  are  shown  in  Fig.  6.6. 

The  target  selected  is  the  west  corner  of  a  building  shown  in  Fig.  6.5.  The 
estimated  target  location  and  2-a  variance  shown  in  Fig.  6.6  shows  an  predicted  ellip¬ 
soidal  uncertainty  after  one  second  and  seven  seconds  of  flight.  Note  the  uncertainty 
ellipse  increases  with  flight  time,  as  expected.  In  each  case,  incorporating  camera  mo¬ 
tion  information  can  constrain  the  correspondence  search  space  significantly.  Note  the 
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Figure  6.5:  Three  image  sequence  of  an  industrial  area  recorded  during  a  T-38 

flight,  with  a  sample  stationary  ground  landmark  identified.  Image  (b)  was  taken 
1  second  after  image  (a).  Image  (c)  was  taken  7  seconds  after  image  (a).  The  aircraft 
is  in  a  right  turn  approximately  3.8  kilometers  from  the  landmark. 

true  landmark  location  remains  consistent  with  the  predicted  2-<r  uncertainty  ellipse 
in  the  presence  of  real  measurement  noise  and  terrain  model  errors. 

6.2  Image  and  Inertial  Fusion  Algorithm  Ground  Tests 

After  demonstrating  the  validity  of  the  stochastic  feature  projection  theory,  the 
algorithm  is  incorporated  into  the  extended  Kalman  filter  navigation  algorithm.  The 
initial  experiments  are  conducted  using  the  sensor  platform  shown  in  Figure  5.1(a). 
This  system  is  equipped  with  a  tactical-grade  IMU  and  a  binocular  pair  of  Pix- 
eLINK  cameras.  The  cameras  are  mounted  level  with  the  platform  with  approxi¬ 
mately  22  centimeters  of  separation. 

The  imaging  and  inertial  fusion  navigation  algorithm  is  evaluated  using  both 
simulated  and  experimental  ground  profiles.  The  profiles  are  designed  to  provide 
a  range  of  image  types  in  order  to  exercise  the  feature  tracking  algorithm.  The 
simulation  and  results  are  presented  in  the  next  section. 
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Base  Image  (2X  Zoom) 


Image  1  (2X  Zoom) 


Image  2  (2X  Zoom) 


Figure  6.6:  Predicted  landmark  location  uncertainty  using  stochastic  projection 

method.  The  landmark  selected  is  the  west  corner  of  a  building  in  the  base  image  (a), 
represented  by  the  crosshair.  Using  the  stochastic  projection  method,  the  landmark 
mean  and  2-a  variance  is  projected  into  two  subsequent  images  to  demonstrate  the 
concept.  The  estimated  landmark  location  and  predicted  2-a  variance  for  image 
(b)  shows  an  ellipsoidal  uncertainty  after  one  second  of  flight.  Image  (c)  shows  a 
further  increase  in  the  uncertainty  after  seven  seconds  of  flight.  In  each  subsequent 
image,  constraining  the  correspondence  search  for  the  landmark  to  the  ellipsoidal 
region  reduces  the  required  search  area  and  would  eliminate  false  matches  with  other 
features  with  a  similar  appearance  (e.g.,  other  building  corners). 
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6.2.1  Simulation.  The  algorithm  is  tested  using  a  Monte  Carlo  simulation 
of  a  standard  indoor  profile.  The  profile  consists  of  a  straight  corridor,  designed  to 
be  similar  to  the  indoor  experimental  data  collection  presented  in  Section  6.2.2. 

An  accurate  simulation  of  the  navigation  environment  requires  simulating  the 
performance  of  the  sensors  in  response  to  a  true  motion  trajectory.  The  trajectory 
was  generated  using  Profgen  version  8.19  software  package  [40].  For  each  Monte  Carlo 
navigation  simulation  run,  the  inertial  sensor  measurements  are  generated  using  the 
true  trajectory  and  an  inertial  sensor  error  model. 

Because  of  the  inherent  complexity  of  the  optical  environment,  it  is  beyond  the 
scope  of  this  research  to  generate  simulated  images.  Instead,  a  simulated  feature  set  is 
created  by  randomly  distributing  features  along  a  corridor  surrounding  the  true  tra¬ 
jectory.  The  features  are  each  given  random  descriptor  vectors  in  order  to  exercise  the 
feature  tracking  algorithm.  While  this  optical  simulation  method  is  appropriate  for 
testing  the  image  and  inertial  fusion  algorithm,  the  results  are  not  directly  comparable 
to  the  real  system  performance,  because  imaging  issues  such  as  lighting  conditions, 
motion  blur,  and  affine  changes  in  the  feature  descriptor  due  to  pose  changes  are  not 
modeled. 

The  simulated  corridor  is  3  meters  wide,  3  meters  high,  and  approximately  300 
meters  long.  Features  are  randomly  generated  on  the  walls,  floor  and  ceiling  of  the 
corridor  with  an  average  spacing  of  0.25  features  per  square  meter.  Each  feature 
is  given  a  random  primary  length  and  orientation,  which,  combined  with  the  true 
pose  of  the  sensor,  results  in  accurately  simulated  scale  and  orientation  parameters  in 
feature  space.  After  a  60-second  stationary  alignment,  the  sensor  platform  accelerates 
to  0.5  meters  per  second,  maintains  this  velocity  until  the  end  of  the  corridor,  then 
accelerates  to  a  stop  at  the  end.  The  platform  remains  stationary  for  60  seconds  after 
coming  to  a  stop.  This  results  in  a  660-second  image  and  inertial  navigation  profile. 
Simulated  images  are  collected  at  2  Hz. 
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A  Monte  Carlo  simulation  is  conducted  using  inertial  sensor  models  representing 
the  H1700  tactical-grade  IMU  and  the  Crista  consumer-grade  IMU.  Each  simulation 
consists  of  60  runs,  each  with  randomly  generated  inertial  measurement  errors  due  to 
random  walks,  sensor  bias,  and  sensor  scale-factor  errors.  In  order  to  mitigate  any 
potential  effects  due  to  the  location  of  the  features  in  the  simulated  environment,  the 
feature  locations  and  descriptors  are  randomly  generated  every  20  runs. 

The  position,  velocity,  and  attitude  errors  using  the  tactical-grade  IMU  are 
shown  in  Figures  6. 7-6. 9.  As  expected,  the  inertial  and  imaging  measurement  errors 
accumulate,  resulting  in  position  and  attitude  drift. 

The  position,  velocity,  and  attitude  errors  using  the  consumer-grade  IMU  are 
shown  in  Figures  6.10-6.12.  In  addition  to  proportionally  larger  errors  in  position, 
velocity,  and  attitude,  significant  excursions  are  noticed,  especially  in  the  attitude 
channels. 

Root-sum-squared  (RSS)  errors  are  analyzed  in  order  to  provide  a  one-dimensional 
metric  for  a  more  direct  comparison  of  the  simulated  system  performance  for  different 
profiles.  The  RSS  errors  comparing  the  free-inertial  and  image-aided  performance 
is  shown  in  Figures  6.13-6.15.  Over  the  10- minute  indoor  profile,  incorporating  the 
image-aiding  measurements  improves  the  errors  for  both  the  consumer  and  tactical- 
grade  sensors  by  many  orders  of  magnitude.  In  addition,  the  image-aided  consumer- 
grade  sensor  nearly  equals  the  position  error  performance  of  the  image-aided  tactical- 
grade  sensor,  until  the  consumer-grade  sensor  begins  to  show  some  attitude  diver¬ 
gences  after  approximately  400  seconds. 

In  this  section,  the  proposed  image  and  inertial  navigation  system  is  tested 
using  a  simulated  indoor  profile.  The  Monte  Carlo  simulation  predicts  dramatic  im¬ 
provements  when  fusing  image  and  inertial  measurements  for  both  consumer  and 
tactical-grade  inertial  sensors,  which  is  an  indicator  of  the  significant  navigation  in¬ 
formation  provided  by  properly  integrated  image  updates.  In  the  next  section,  the 
experimental  data  collection  profiles  and  results  are  presented. 
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Figure  6.7:  Simulated  60-run  Monte  Carlo  position  error  results  for  indoor  profile 

with  a  tactical-grade  inertial  sensor  and  image  aiding  using  landmarks  of  opportu¬ 
nity.  The  position  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.8:  Simulated  60-run  Monte  Carlo  velocity  error  results  for  indoor  profile 

with  a  tactical-grade  inertial  sensor  and  image  aiding  using  landmarks  of  opportu¬ 
nity.  The  velocity  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.9:  Simulated  60-run  Monte  Carlo  attitude  error  results  for  indoor  profile 
with  a  tactical-grade  inertial  sensor  and  image  aiding  using  landmarks  of  opportu¬ 
nity.  The  attitude  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.10:  Simulated  60- run  Monte  Carlo  position  error  results  for  indoor  profile 
with  a  consumer-grade  inertial  sensor  and  image  aiding  using  landmarks  of  oppor¬ 
tunity.  The  position  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.11:  Simulated  60-run  Monte  Carlo  velocity  error  results  for  indoor  profile 
with  a  consumer-grade  inertial  sensor  and  image  aiding  using  landmarks  of  oppor¬ 
tunity.  The  velocity  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.12:  Simulated  60-run  Monte  Carlo  attitude  error  results  for  indoor  profile 
with  a  consumer-grade  inertial  sensor  and  image  aiding  using  landmarks  of  oppor¬ 
tunity.  The  attitude  error  sample  functions  are  indicated  by  blue  dotted  lines.  The 
ensemble  mean  and  standard  deviation  are  indicated  by  the  green  and  red  solid  lines, 
respectively. 
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Figure  6.13:  Simulated  60-run  Monte  Carlo  root-sum-squared  (RSS)  horizontal 

position  error  for  indoor  profile  using  both  consumer-grade  and  tactical-grade  inertial 
sensors.  The  results  are  shown  for  four  cases:  1)  consumer-grade  free  inertial,  2) 
consumer-grade  opportunity  landmark  tracking,  3)  tactical-grade  free  inertial,  and  4) 
tactical-grade  opportunity  landmark  tracking. 
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Figure  6.14:  Simulated  60- run  Monte  Carlo  root-sum-squared  (RSS)  velocity  error 
for  indoor  profile  using  both  consumer- grade  and  tactical- grade  inertial  sensors.  The 
results  are  shown  for  four  cases:  1)  consumer-grade  free  inertial,  2)  consumer-grade 
opportunity  landmark  tracking,  3)  tactical-grade  free  inertial,  and  4)  tactical-grade 
opportunity  landmark  tracking. 
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Figure  6.15:  Simulated  60- run  Monte  Carlo  root-sum-squared  (RSS)  attitude  error 
for  indoor  profile  using  both  consumer- grade  and  tactical- grade  inertial  sensors.  The 
results  are  shown  for  four  cases:  1)  consumer-grade  free  inertial,  2)  consumer-grade 
opportunity  landmark  tracking,  3)  tactical-grade  free  inertial,  and  4)  tactical-grade 
opportunity  landmark  tracking. 
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Figure  6.16:  Outdoor  ground  data  collection  profile.  The  image-aided  inertial  nav¬ 
igation  sensor  is  tested  by  traversing  a  U-shaped  path  and  returning  to  the  start 
position. 

6.2.2  Experiment.  The  algorithm  is  tested  experimentally  using  two  ground 
navigation  profiles  designed  to  examine  the  operation  of  the  feature  tracking  system  in 
a  real-world  environment.  The  first  profile  consisted  of  a  closed  path  over  an  outdoor 
parking  area,  shown  in  Figure  6.16.  The  outdoor  path  was  traversed  forward  and 
backwards  with  the  camera  pointed  toward  the  outside  of  the  path.  This  trajectory 
resulted  in  seven  segments  that  presented  a  scene  change  and  forced  the  filter  to  search 
for  new  features.  This  outdoor  scene  consisted  of  a  combination  of  man-made  features 
(buildings,  fences,  roads,  etc.)  and  natural  features  such  as  grass  and  trees.  The 
profile  began  with  a  10-minute  stationary  alignment  period,  followed  by  four  minutes 
of  navigation  using  only  images  and  inertial  measurements.  No  prior  knowledge  was 
used  with  any  feature.  The  filter  was  limited  to  a  maximum  of  ten  features  at  any 
time.  A  sample  image  from  the  outdoor  profile  is  shown  in  Figure  6.17. 

This  profile  presents  the  algorithm  with  a  challenging  feature  tracking  environ¬ 
ment  due  to  the  high-contrast  lighting  conditions,  large  variation  in  feature  distance 


156 


Figure  6.17:  Sample  image  from  outdoor  data  collection.  The  outdoor  data  collec¬ 
tion  presents  the  filter  with  a  combination  of  man-made  and  natural  features.  The 
crosses  and  ellipses  indicate  the  locations  and  uncertainty  of  currently  tracked  land¬ 
marks. 
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(zero  to  infinite),  and  complicated  images  with  semi-transparent  objects  overlapping 
at  different  ranges  (e.g.,  multiple  layers  of  tree  limbs). 

The  filter  successfully  utilizes  inertial  measurements  to  predict  and  constrain 
the  image  correspondence  search  during  the  entire  profile.  In  return,  the  feature 
correspondence  updates  and  corrects  the  inertial  measurement  errors  and  significantly 
reduces  the  resulting  drift  in  the  navigation  solution.  Over  the  four-minute  non¬ 
stationary  profile,  the  navigation  errors  are  estimated  to  be  less  than  1  meter  in  the 
horizontal  plane  and  less  than  3  meters  in  the  vertical,  when  the  image-aided  inertial 
algorithm  is  enabled.  Typical  free-inertial  performance  for  this  inertial  sensor  is  on 
the  order  of  thousands  of  meters  horizonal  error.  In  addition,  the  unstable  nature  of 
the  vertical  channel  during  free-inertial  navigation  requires  external  aiding  in  order 
to  maintain  stability  [64] .  These  initial  results  clearly  show  the  benefits  of  the  image- 
aided  inertial  navigation  method. 

The  second  profile  consisted  of  a  closed  path  in  an  indoor  environment.  The  path 
began  and  ended  at  the  same  location  and  orientation  in  the  Advanced  Navigation 
Technology  (ANT )  Center  laboratory,  at  the  Air  Force  Institute  of  Technology.  As  in 
the  previous  profile,  the  data  collection  began  with  a  10-minute  stationary  alignment 
period.  After  the  alignment  period,  the  sensor  was  moved  in  a  10-minute  loop  around 
the  hallways  of  the  building.  In  contrast  to  the  previous  profile,  the  sensor  was  pointed 
primarily  in  the  direction  of  travel.  No  prior  knowledge  was  provided  to  the  algorithm 
regarding  the  location  of  features  or  structure  of  the  environment.  A  sample  image 
from  the  indoor  profile  is  shown  in  Figure  6.18. 

The  indoor  profile  presents  the  algorithm  with  different  challenges  from  a  fea¬ 
ture  tracking  perspective.  The  indoor  environment  consists  of  repetitive,  visually 
identical  features  (e.g.,  floor  tiles,  lights,  etc.),  which  can  easily  cause  confusion  for 
the  feature  tracking  algorithm.  In  addition,  reflections  from  windows  and  other  shiny 
surfaces  might  not  be  interpreted  properly  by  the  filter  and  could  potentially  result 
in  navigation  errors.  Finally,  the  lower  light  intensity  levels  and  large  areas  with  poor 
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Figure  6.18:  Sample  image  from  indoor  data  collection.  The  indoor  data  collection 
presents  the  filter  with  man-made  features  in  an  office  environment.  The  crosses  and 
ellipses  indicate  the  locations  and  uncertainty  of  currently  tracked  landmarks. 

contrast  (e.g.,  smooth,  featureless  walls)  presents  a  relatively  stark  feature  space.  The 
indoor  profile  is  performed  twice  for  both  the  tactical  and  navigation-grade  sensors. 

As  with  the  previous  profile,  the  filter  performs  well.  The  filters’  estimates  of 
the  trajectories  are  overlayed  on  a  floor  plan  of  the  building  in  Figures  6.19  and  6.20 
for  the  tactical  and  consumer-grade  inertial  sensors,  respectively.  In  each  figure,  a 
comparison  is  made  between  the  fused  image- aided  inertial  trajectory  estimate,  the 
image-aided  inertial  trajectory  with  stochastic  constraints  disabled,  and  a  free  inertial 
trajectory.  For  both  tactical  and  consumer- grade  sensors,  the  estimated  trajectory 
generally  corresponds  to  the  building’s  hallways,  with  excursions  of  less  than  3  me¬ 
ters.  In  addition,  the  results  of  the  free-inertial  trajectories  show  the  inherent  lack 
of  accuracy  of  the  inertial  sensor.  With  stochastic  constraints  purposely  disabled, 
the  trajectory  estimates  show  relatively  large  trajectory  errors  due  to  false  correspon¬ 
dence  matches.  This  illustrates  the  catastrophic  effects  of  incorporating  false  updates 
into  an  Extended  Kalman  Filter  with  inertial  feedback  and  demonstrates  the  inher- 
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ent  strength  of  applying  robust  correspondence  methods,  and  in  particular  stochastic 
constraints. 

The  filter’s  estimated  trajectory  for  the  image-aided  consumer-grade  sensor  (run 
two)  is  examined  in  more  detail  in  Figure  6.23,  where  the  estimated  location  of  land¬ 
marks  used  for  tracking  are  highlighted.  Note  the  landmarks  correspond  to  the  build¬ 
ing  walls,  ceilings,  and  floors.  More  detail  of  the  start/stop  area  is  shown  in  Fig¬ 
ure  6.22.  A  comparison  of  all  image-aided  inertial  navigation  results  for  both  the 
tactical  and  consumer- grade  sensors  is  shown  in  Figure  6.21.  The  difference  in  the 
estimated  start  and  stop  locations  shows  the  accumulated  errors  in  the  filter  over  the 
path.  Over  the  10-minute  profile,  the  path  closure  errors  are  less  than  5  m  in  the 
horizontal  plane  and  less  than  5  m  in  the  vertical  for  all  sensors.  Again,  this  is  a 
significant  improvement  over  the  free-inertial  performance.  The  results  are  particu¬ 
larly  impressive  as  the  solution  is  calculated  using  an  online  algorithm,  with  only  raw 
inertial  and  image  data.  No  a-priori  knowledge  of  the  environment  is  provided  to  the 
system. 

6.3  Image  and  Inertial  Fusion  Algorithm  Flight  Tests 

In  addition  to  the  ground  tests,  the  image  and  inertial  fusion  algorithm  is  evalu¬ 
ated  using  a  flight  profile.  The  flight  environment  presents  a  different  set  of  challenges 
to  the  system  than  the  indoor  environment.  First,  the  dynamic  range  from  the  sensor 
to  landmarks  is  large,  which  limits  the  usefulness  of  binocular  ranging.  While  binoc¬ 
ular  measurements  can  provide  reliable  distance  estimates  for  close  landmarks,  they 
are  ineffective  at  longer  distances.  Fortunately,  accurate  terrain  models  are  avail¬ 
able  to  a  high  level  of  fidelity.  Incorporating  the  terrain  elevation  into  the  landmark 
location  estimation  eliminates  the  need  for  binocular  ranging  at  higher  altitudes.  Sec¬ 
ondly,  the  flight  environment  would  typically  imply  longer  distance  navigation  than 
indoor  conditions.  Finally,  due  to  the  worldwide  availability  of  high-quality  satellite 
imagery,  there  is  an  existing  set  of  reference  features  which  could  be  exploited  in  the 
long-distance  navigation  problem. 
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Figure  6.19:  Estimated  path  from  indoor  data  collection  (run  two)  using  tactical- 

grade  inertial  sensor.  The  filter’s  estimate  of  the  path  (indicated  by  the  solid  line) 
agrees  well  with  the  known  path  (indicated  by  the  dotted  line) .  The  inertial-only  best 
estimate  of  trajectory  (indicated  by  the  dash-dotted  line)  and  image- aided  inertial 
with  stochastic  constraints  disabled  (indicated  by  the  dashed  line)  show  large  errors 
in  position  and  heading.  The  inertial-only  solution  exceeds  the  scale  of  the  image 
after  156  seconds. 
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Figure  6.20:  Estimated  path  from  indoor  data  collection  (run  two)  using  consumer- 
grade  inertial  sensor.  The  filter’s  estimate  of  the  path  (indicated  by  the  solid  line) 
agrees  well  with  the  known  path  (indicated  by  the  dotted  line) .  The  inertial-only  best 
estimate  of  trajectory  (indicated  by  the  dash-dotted  line)  and  image- aided  inertial 
with  stochastic  constraints  disabled  (indicated  by  the  dashed  line)  show  large  errors 
in  position  and  heading.  The  inertial-only  solution  exceeds  the  scale  of  the  image 
after  11  seconds. 
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Approximate  reference  path 
—  Tactical-grade  run  1 
■  -  Tactical-grade  run  2 
—  Consumer-grade  run  1 
|- Consumer-grade  run  2 


Figure  6.21:  Performance  comparison  for  image- aided  tactical  and  consumer- grade 
inertial  sensors  for  indoor  profile  runs  one  and  two.  The  common  start  and  stop 
location  for  both  runs  is  indicated  by  the  “o”  symbol.  The  estimated  stop  location 
for  each  run  is  indicated  by  an  “x”  symbol. 
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Approximate  reference  path 
Tactical-grade  run  1 
Tactical-grade  run  2 
Consumer-grade  run  1 
Consumer-grade  run  2 


Figure  6.22:  Enhanced  detail  of  the  start/stop  area  illustrating  the  estimated  tra¬ 
jectory  and  feature  locations  for  both  the  image-aided  tactical  and  consumer-grade 
inertial  sensors  for  indoor  profile  runs  one  and  two.  The  difference  between  the  esti¬ 
mated  start  and  stop  location  illustrates  the  accumulated  position  error. 
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a)  Entire  path  estimate 


b)  First  half  of  path  with  tracked  feature  locations 


c)  Second  half  of  path  with  tracked  feature  locations 


d)  Start/stop  area  detail 


Figure  6.23:  Estimated  path  and  feature  locations  from  indoor  data  collection  (run 
two)  for  consumer-grade  inertial  sensor.  Pane  (a)  shows  the  entire  path  estimate. 
Pane  (b)  shows  the  first  half  of  the  path  along  with  the  estimated  location  of  the 
features  (indicated  by  “x”  symbols).  Pane  (c)  shows  the  last  half  of  the  path  and 
estimated  feature  locations.  Pane  (d)  provides  detail  of  the  start/stop  area. 
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Although  experimental  flight  data  was  collected,  poor-quality  truth  data  were 
collected  due  to  installation  issues  with  the  GPS  antenna.  This  poor  GPS  observ¬ 
ability  resulted  in  a  low-quality  sensor  calibration  (see  Section  5.2.2).  As  such,  the 
resulting  image-aided  inertial  navigation  solution  is  difficult  to  evaluate.  Additional 
flight  data  collections  are  required  with  reliable  navigation  and  ground  truth  for  an 
accurate  system  analysis. 

As  a  result,  the  imaging  and  inertial  fusion  navigation  algorithm  is  evaluated 
using  a  simulated  flight  profile.  The  profile  is  designed  to  evaluate  the  performance  of 
the  system  in  a  flight  environment  using  various  levels  of  prior  knowledge  regarding 
the  scene.  The  simulation  setup  and  results  are  presented  in  the  next  section. 

6.3.1  Simulation.  Three  simulation  profiles  are  tested  to  evaluate  the  per¬ 
formance  of  the  system  with  varying  levels  of  prior  scene  knowledge.  These  knowledge 
levels  represent  three  cases.  Case  one  is  that  of  navigation  over  terrain  with  unknown 
optical  features,  which  is  similar  to  the  ground  navigation  problem  presented  in  the 
previous  section,  with  the  exception  that  a  statistical  terrain  model  is  used  to  estimate 
the  feature  distance  versus  binocular  stereopsis.  Case  two  evaluates  the  capability  of 
the  algorithm  to  recall  previously  estimated  landmark  locations  and  use  them  to  up¬ 
date  the  navigation  system.  This  would  be  representative  of  a  loitering  pattern  over 
unknown  territory.  Finally,  the  algorithm  is  evaluated  for  the  case  where  registered 
imagery  (and  the  resulting  registered  landmark  set)  are  available. 

In  order  to  match  the  experimental  data  collection,  the  simulation  trajectory 
is  a  14 -km,  oval  holding  pattern,  flown  at  1000  feet  above  ground  level  (AGL).  A 
groundspeed  of  50  m/s  was  flown,  which  results  in  a  10-minute  pattern.  The  terrain 
is  modeled  as  relatively  flat  with  a  5-meter  standard  deviation.  The  camera  is  modeled 
after  the  flight  system,  with  a  4.8 -mm  lens  and  angled  at  45  degrees  toward  the  ground. 
Both  the  H1700  tactical-grade  and  Crista  consumer-grade  IMUs  are  evaluated  in  the 
simulation.  Each  profile  begins  with  a  60  second  in-flight  alignment,  during  a  straight 
and  level  portion  of  the  trajectory.  The  filter  is  limited  to  12  concurrent  landmark 
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tracks.  A  total  of  30  simulated  runs  are  conducted  with  randomized  image  and  inertial 
measurements. 

The  tactical- grade  IMU  errors  are  shown  in  Figures  6.24-6.26.  Four  profiles 
are  simulated.  The  first  establishes  a  baseline  of  performance  for  the  tactical- grade 
IMU  with  only  barometric  altimeter  updates  and  no  image  aiding.  As  expected,  the 
free  inertial  horizontal  position  error  diverges  quickly.  Enabling  the  image  aiding 
using  landmarks  of  opportunity  shows  an  improvement  in  position  error.  After  1200 
seconds  and  two  laps  of  the  holding  pattern,  the  image-aided  horizontal  root-sum- 
squared  (RSS)  error  is  less  than  100  meters.  The  potential  benefits  of  leveraging 
previously-learned  landmarks  can  be  seen  at  620  seconds  into  the  profile,  where  the 
second  pass  of  the  holding  pattern  begins.  The  navigation  errors  are  reduced  due 
to  the  relatively  high-quality  landmark  location  estimates  available  from  the  initial 
portion  of  the  first  pass.  Finally,  enabling  the  use  of  reference  landmarks  with  known 
locations  shows  a  consistent,  meter-level,  horizontal  RSS  error,  which  indicates  the 
high-level  of  navigation  potential  available  from  an  image-aided  inertial  navigation 
system  combined  with  reference  images. 

The  Monte  Carlo  simulation  is  repeated  using  the  Crista  inertial  sensor  model. 
The  results  are  shown  in  Figures  6.27-6.29.  Due  to  the  relatively  low-quality  of  this 
sensor,  the  free  inertial  (with  barometric  altimeter  aiding)  solution  quickly  diverges 
and  is  not  useful  for  navigation  for  more  than  approximately  20  seconds.  Enabling 
image-aiding  using  landmarks  of  opportunity  shows  significant  improvement  in  the 
navigation  performance  while  also  revealing  a  significant  source  of  error  in  the  ex¬ 
tended  Kalman  filter,  which  is  amplified  by  the  low-quality  inertial  measurements. 
As  the  filter  errors  increase,  two  issues  occur  which  violate  the  assumptions  of  the 
extended  Kalman  filter:  1)  the  nonlinear  dynamics  were  developed  assuming  small 
errors,  and  2)  the  landmark  measurement  influence  matrices  are  calculated  assuming 
small  errors.  Both  issues  contribute  to  filter  instability  as  the  errors  in  the  naviga¬ 
tion  state  are  not  well  represented  by  the  state  error  covariance  maintained  by  the 
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Figure  6.24:  Simulated  30-run  Monte  Carlo  root-sum-squared  (RSS)  horizontal  po¬ 
sition  error  for  flight  profile  using  a  tactical-grade  inertial  sensor.  The  results  are 
shown  for  four  cases:  1)  image  updates  disabled,  2)  opportunity  landmark  track¬ 
ing  with  terrain  model,  3)  opportunity  landmark  tracking  with  self-learning,  and  4) 
landmark  with  known  location  tracking. 
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Figure  6.25:  Simulated  root-sum-squared  (RSS)  velocity  error  for  flight  profile  us¬ 
ing  a  tactical-grade  inertial  sensor.  The  results  are  shown  for  four  cases:  1)  image 
updates  disabled,  2)  opportunity  landmark  tracking  with  terrain  model,  3)  oppor¬ 
tunity  landmark  tracking  with  self-learning,  and  4)  landmark  with  known  location 
tracking.  The  errors  were  calculated  using  a  30-run  Monte  Carlo  analysis. 
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Figure  6.26:  Simulated  root-sum-squared  (RSS)  attitude  error  for  flight  profile  us¬ 
ing  a  tactical-grade  inertial  sensor.  The  results  are  shown  for  four  cases:  1)  image 
updates  disabled,  2)  opportunity  landmark  tracking  with  terrain  model,  3)  oppor¬ 
tunity  landmark  tracking  with  self-learning,  and  4)  landmark  with  known  location 
tracking.  The  errors  were  calculated  using  a  30-run  Monte  Carlo  analysis. 
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filter  [38] .  In  this  simulation,  the  filter  would  quickly  diverge  after  approximately  500 
seconds. 

In  order  to  investigate  this  issue  further,  the  Monte  Carlo  simulation  is  modified 
such  that  the  linearization  errors  for  image  updates  are  artificially  removed.  All  other 
error  sources  are  enabled.  Referencing  Figures  6.27-6.29,  shows  significantly  improved 
performance,  which  indicates  that  the  image-update  influence  matrix  calculation  is 
the  likely  culprit.  This  being  the  case,  particular  attention  must  be  paid  to  addressing 
these  fundamental  error  sources,  especially  when  large  errors  are  expected. 

The  final  Monte  Carlo  examines  the  performance  when  enabling  reference  land¬ 
marks  with  known  locations.  As  with  the  tactical-grade  simulation,  the  filter  shows 
a  consistent,  meter-level,  horizontal  RSS  error,  which  indicates  the  high-level  of  nav¬ 
igation  potential  available  from  an  image-aided  inertial  navigation  system  combined 
with  reference  images,  even  with  a  low-quality  inertial  reference.  In  this  scenario,  the 
reference  landmark  updates  dominate  the  performance  of  the  inertial  sensor,  which,  in 
essence,  causes  the  filter  to  use  the  inertial  sensor  primarily  to  statistically  constrain 
the  correspondence  search.  Once  a  successful  correspondence  is  determined,  the  filter 
weights  the  reference  landmark  update  strongly,  as  expected,  which  is  indicated  by 
the  comparable  performance  between  the  tactical  and  consumer  grade  sensors  using 
reference  landmark  updates. 

In  the  next  chapter,  conclusions  are  drawn  regarding  the  performance  of  the 
system  and  contributions  to  the  state  of  the  science  of  navigation  are  highlighted. 
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Time  (s) 

Figure  6.27:  Simulated  root-sum-squared  (RSS)  horizontal  position  error  for  flight 
profile  using  a  consumer- grade  inertial  sensor.  The  results  are  shown  for  four  cases: 
1)  image  updates  disabled,  2)  opportunity  landmark  tracking  with  terrain  model, 
3)  opportunity  landmark  tracking  with  linearization  error  removal,  and  4)  landmark 
with  known  location  tracking.  The  errors  were  calculated  using  a  30-run  Monte  Carlo 
analysis.  The  landmark  of  opportunity  simulated  results  were  truncated  prior  to 
impending  filter  divergence  due  to  violations  of  the  filter  assumptions  due  to  large 
errors. 
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Figure  6.28:  Simulated  root-sum-squared  (RSS)  velocity  error  for  flight  profile  us¬ 
ing  a  consumer-grade  inertial  sensor.  The  results  are  shown  for  four  cases:  1)  image 
updates  disabled,  2)  opportunity  landmark  tracking  with  terrain  model,  3)  opportu¬ 
nity  landmark  tracking  with  linearization  error  removal,  and  4)  landmark  with  known 
location  tracking.  The  errors  were  calculated  using  a  30-run  Monte  Carlo  analysis. 
The  landmark  of  opportunity  simulated  results  were  truncated  prior  to  impending 
filter  divergence  due  to  violations  of  the  filter  assumptions  due  to  large  errors. 
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Figure  6.29:  Simulated  root-sum-squared  (RSS)  attitude  error  for  flight  profile 

using  a  consumer-grade  inertial  sensor.  The  results  are  shown  for  four  cases:  1) 
image  updates  disabled,  2)  opportunity  landmark  tracking  with  terrain  model,  3) 
opportunity  landmark  tracking  with  linearization  error  removal,  and  4)  landmark 
with  known  location  tracking.  The  errors  were  calculated  using  a  30-run  Monte  Carlo 
analysis.  The  landmark  of  opportunity  simulated  results  were  truncated  prior  to 
impending  filter  divergence  due  to  violations  of  the  filter  assumptions  due  to  large 
errors. 
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VII.  Conclusions 


This  dissertation  presents  an  approach  for  fusing  optical  and  inertial  sensors 
for  robust,  self-contained,  passive,  autonomous  navigation.  In  this  chapter, 
conclusions  regarding  the  research  effort  are  presented  and  discussed.  In  addition, 
potential  areas  for  future  research  are  addressed. 

7. 1  Conclusions 

The  most  significant  conclusion  for  this  work  is  the  demonstration  of  the  viability 
of  fusing  imaging  and  inertial  sensors  for  navigation.  As  mentioned  in  Chapter  I,  the 
goal  for  this  research  is  to  develop  a  method  for  integrating  the  sensors  at  a  deeper  level 
than  the  current  state-of-the  art.  The  technique  is  incorporated  into  an  automated 
image-aided  inertial  navigation  system  and  tested  using  simulation  and  experimental 
methods. 

The  novel  sensor  fusion  technique  is  created  by  defining  the  feature  space  and 
developing  the  mathematics  required  to  perform  statistically  rigorous  transformations 
in  this  space  to  predict  the  features  at  a  future  time.  The  stochastic  projection 
method  successfully  leverages  the  collection  of  inertial  measurements  between  images 
to  predict  the  location  and  uncertainty  region  of  features  which  aids  the  attendant 
feature  correspondence  algorithm.  The  performance  of  this  algorithm  is  demonstrated 
using  both  simulation  and  experiments  in  Chapter  IV.  In  both  cases,  the  experiments 
confirm  the  validity  and  accuracy  of  the  stochastic  projections. 

The  feature  transformation  algorithm  is  derived  using  fundamental  stochastic 
models  for  various  imaging  conditions.  Using  these  models,  the  statistical  feature 
space  transformation  algorithm  is  applied  to  both  monocular  and  multi-sensor  de¬ 
signs  for  unknown,  partially-known,  and  with  fully-known  information  regarding  the 
world.  The  ability  to  accurately  predict  the  location  of  a  feature  statistically  using 
all  available  information  forms  the  basis  of  the  image  and  inertial  fusion  technique. 

In  addition  to  the  statistical  feature  transformation  techniques,  a  new  method 
of  calibration  of  imaging  and  inertial  sensors  is  presented  and  tested.  The  addition  of 
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self-calibration  states  to  the  Kalman  filter  allows  an  integrated  imaging  and  inertial 
sensor  to  self-estimate  calibration  parameters  using  known  optical  references  (e.g., 
celestial  bodies  and  surveyed  ground  points)  and  independent  measurements  of  the 
trajectory,  such  as  GPS.  This  technique  is  evaluated  using  simulation,  ground,  and 
flight  data.  In  each  experiment,  the  observability  of  the  self-calibration  states  is 
verified  and  shown  to  be  primarily  related  to  the  fundamental  accuracy  of  the  sensors. 
The  ability  to  automatically  correct  for  calibration  errors  using  this  technique  enables 
the  system  to  compensate  for  these  errors  outside  of  the  laboratory  environment. 

A  key  contribution  of  this  work  is  the  development  and  implementation  of  the 
deep-integration  theory  in  an  experimental  imaging  and  inertial  navigation  sensor. 
The  image-aided  inertial  navigation  system  is  evaluated  using  a  combination  of  simu¬ 
lation,  ground,  and  flight  tests.  In  each  experiment,  the  image-aided  inertial  naviga¬ 
tion  system  consistently  demonstrates  several  orders  of  magnitude  improvement  over 
the  equivalent  unaided  inertial  system  (and  even  that  of  navigation-grade  sensors). 
This  development  holds  a  great  deal  of  promise  to  produce  a  low-cost,  navigation- 
grade  optical-inertial  sensor  for  passive  environments.  In  addition,  the  demonstrated 
capability  to  store  self-surveyed  features  and  recall  them  when  needed  enables  this 
sensor  to  effectively  “learn”  an  environment  and  eliminate  navigation  error  drift  while 
remaining  in  known  areas.  A  number  of  operationally-realistic  scenarios  which  lever¬ 
age  this  revolutionary  concept  are  discussed  in  the  next  section. 

As  expected,  the  extended  Kalman  filter  navigation  algorithm  shows  the  dele¬ 
terious  effects  of  linearization  errors.  These  errors  are  most  prevalent  in  the  presence 
of  large  attitude  errors.  When  the  linearization  errors  are  artificially  corrected  during 
simulation,  the  navigation  filter  performance  is  consistent  with  the  system  model. 
These  results  show  that  the  extended  Kalman  filter  algorithm  delivers  sub-optimal 
performance  and  improvement  could  be  realized  using  an  estimator  which  does  not 
suffer  from  the  linearization  errors  of  the  extended  Kalman  filter.  This  conclusion 
shows  that  additional  research  into  estimation  algorithms  has  promise  and  should  be 
investigated.  Some  potential  approaches  are  presented  in  the  next  section. 
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In  summary,  this  research  develops  a  series  of  techniques  to  provide  autonomous, 
passive  navigation  by  incorporating  image  and  inertial  measurements.  The  method 
demonstrates  navigation-quality  performance  with  low-cost  sensors  and  passive  up¬ 
dates.  While  the  approach  demonstrates  greatly  improved  performance  over  free 
inertial  navigation  with  equivalent  sensors,  there  are  areas  which  merit  additional 
research.  These  are  addressed  in  the  next  section. 

7.2  Future  Work 

This  research  presents  a  statistically-rigorous  method  to  tightly  integrate  imag¬ 
ing  and  inertial  sensors.  The  method  is  used  to  construct  a  navigation  system  which 
leverages  optical  signals  of  opportunity  and  does  not  require  external  navigation  sig¬ 
nals.  In  addition  to  the  system  presented  in  this  research,  there  are  many  potential 
areas  of  research  which  exploit  the  natural  synergy  between  optical  and  inertial  sen¬ 
sors. 

The  first  general  area  of  additional  research  is  the  development  and  improvement 
of  feature  transformation  algorithms.  While  the  SIFT  algorithm  has  demonstrated 
potential  for  image- aided  navigation  applications,  there  are  other  feature  transforma¬ 
tion  algorithms  which  should  be  investigated.  A  revised  SIFT  algorithm  has  been 
proposed  by  Ke  [27]  which  replaces  the  feature  descriptor  calculation  of  SIFT  with  a 
principal  component  analysis  (PCA)-based  decomposition.  The  PCA-SIFT  algorithm 
has  demonstrated  improved  feature  matching  performance  over  the  traditional  SIFT 
algorithm. 

Conceptually,  the  most  desirable  feature  transformation  algorithm  is  one  in 
which  the  location  and  object  description  dimensions  are  truly  orthogonal.  In  simpler 
terms,  the  ideal  transformation  algorithm  would  detect  and  recognize  distinct  objects , 
in  a  similar  manner  to  humans.  Once  an  object  was  recognized,  an  additional  level 
of  navigation  information  could  be  extracted  from  the  object,  include  a  direct  mea¬ 
surement  of  orientation  and  range.  Inferring  range  information  from  the  apparent 
size  of  an  object  with  known  dimensions  is  referred  to  as  stadiametric  ranging  and 
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is  commonly  used  for  target  ranging  when  external  ranging  sources  are  unavailable. 
The  additional  information  provided  by  this  type  of  measurement  would  theoretically 
improve  the  navigation  performance  of  such  a  system,  and  could  reduce  the  effects  of 
linearization  errors  which  were  evident  in  the  extended  Kalman  filter  algorithm  used 
in  this  research. 

The  next  level  of  potential  research  is  to  explore  a  potentially  deeper  level  of 
integration  between  inertial  and  imaging  sensors.  The  concept  would  entail  incorpo¬ 
rating  inertial  measurements  at  the  feature  transformation  level.  In  general  terms, 
the  approach  can  be  clarified  as  follows.  As  discussed  in  Chapter  III,  the  propagation 
of  the  navigation  state  using  inertial  measurements  is  subject  to  the  effects  of  sensor 
errors,  such  that  an  uncertainty  region  exists  when  projecting  the  location  of  a  feature 
from  one  image  to  the  next.  This  uncertainty  region  is  a  function  of  three  primary 
sources  of  error:  1)  the  initial  uncertainty  of  the  navigation  state,  2)  the  accuracy 
of  the  inertial  sensors,  and  3)  the  integration  time.  Since  the  uncertainty  is  actually 
related  to  the  relative  navigation  error  between  frames,  most  of  the  initial  navigation 
state  uncertainty  is  mitigated  when  the  navigation  errors  are  relatively  constrained. 
Accordingly,  if  the  algorithm  could  dynamically  control  the  integration  time  (i.e.,  im¬ 
age  sample  rate),  the  uncertainty  region  for  feature  projection  between  images  could 
be  effectively  managed.  Thus,  given  a  minimum  feature  size  (in  the  feature  space), 
the  image  sample  rate  could  be  dynamically  varied  such  that  the  probability  of  false 
correspondence  would  be  greatly  reduced.  Viewing  the  measured  feature  space  as  a 
sampling  of  the  true  feature  space  reveals  parallel  concepts  between  this  method  of 
false  correspondence  elimination  and  sampling  theory.  In  fact,  this  proposed  tech¬ 
nique  is  a  method  of  feature  space  anti-aliasing  which  uses  a  dynamic  filter  based  on 
inertial  measurements. 

In  the  proposed  image-aided  navigation  filter  described  in  this  research,  the 
feature  extraction  calculations  consumed  the  majority  of  computer  processing  time. 
In  order  to  employ  such  a  system  in  real-time  required  optimization  of  the  attendant 
feature  transformation  algorithm.  Two  paths  for  improving  the  speed  of  these  calcula- 
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tions  are  proposed.  First,  the  feature  transformation  algorithm  could  be  accelerated 
by  exploiting  parallel  processing  and  pipelining.  As  shown  in  the  SIFT  algorithm, 
feature  extraction  can  be  accomplished  using  separable  scale-space  decompositions, 
which  is  ideal  for  parallel  processing.  The  next  improvement  would  constrain  the  fea¬ 
ture  extraction  algorithm  to  the  statistical  search  region  predicted  using  the  stochastic 
projection  method  developed  in  this  research.  By  limiting  the  feature  transformation 
calculation  to  a  much  smaller  area,  significant  speed  improvements  could  be  realized. 

Another  area  of  promising  research  would  investigate  potential  of  other  nonlin¬ 
ear  filtering  approaches.  As  mentioned  in  Chapter  VI,  the  extended  Kalman  filter  is 
sensitive  to  linearization  errors,  which  are  unmodeled.  Although  beyond  the  scope  of 
this  research,  additional  estimation  schemes  have  been  proposed  which  address  these 
issues.  Some  examples  are  iterated  extended  Kalman  filters,  unscented  particle  filters, 
and  multiple  model  extended  Kalman  filters.  In  addition,  there  is  potential  for  truly 
nonlinear  estimation  sensors  such  as  those  found  in  neural  networks. 

As  mentioned  previously,  there  exists  a  natural  synergy  between  imaging  and 
inertial  systems.  It  is  evident  based  on  this  research  that  construction  of  a  fused 
optical  and  inertial  sensor  would  provide  a  standard,  consistent,  and  affordable  plat¬ 
form  for  image-aided  inertial  navigation.  With  the  recent  advances  in  MEMS  and 
CMOS  imaging  technology,  an  image  and  inertial  sensor  integrated  (and  possible  fea¬ 
ture  transformation  circuits)  on  a  single  chip  could  open  the  doors  for  widespread 
usage.  In  addition  to  ensuring  excellent  time-synchronization  between  the  image  and 
inertial  sensors,  the  single  chip  solution  would  virtually  eliminate  the  need  for  field 
calibration,  and  could  result  in  a  very  economical,  and  robust  sensor. 

In  addition  to  the  benefits  in  cost  and  performance,  integrating  the  imaging 
and  inertial  sensor  at  the  production  level  creates  a  “standard”  image  and  inertial 
navigation  sensor  which  can  benefit  enormously  from  multi-sensor  designs.  Scenarios 
which  would  greatly  benefit  from  this  scaling  effect  are  classified  as  either  single  or 
multi-ship  implementations.  Mounting  multiple  integrated  sensors  on  a  single  vehicle 


179 


LANDMARK 


LANDMARK 


3 

Figure  7.1:  Multi-sensor  imaging  scenario  from  solo  vehicle.  Mounting  separate 

integrated  image  and  inertial  sensors  at  the  nose  and  from  a  towed  sensor  array  would 
provide  a  long  binocular  disparity  which  could  be  used  for  accurate  range  estimation, 
even  when  targets  are  not  simultaneously  in  the  field  of  view. 

would  provide  either  a  direct  (single-epoch)  or  time-delayed  stereoscopic  measure¬ 
ment,  which  would  help  to  reduce  the  range  estimation  error.  As  shown  in  Figure  7.1, 
mounting  separate  integrated  sensors  at  the  nose  and  tail  (or  towed  sensor  array) 
of  an  aircraft  would  benefit  from  a  relatively  long  binocular  disparity,  resulting  in 
accurate  range  estimation,  even  at  higher  altitudes. 

Multi-ship  implementations  of  the  fused  image  and  inertial  navigation  sensor 
offer  even  more  exciting  possibilities.  The  demonstrated  ability  of  the  sensor  algo¬ 
rithm  to  “remember”  features  would  allow  each  ship  in  the  formation  to  effectively 
share  observations  of  the  same  landmark(s).  This  would  not  only  allow  the  forma¬ 
tion  to  accurately  navigate  relative  to  the  world,  but  would  provide  accurate  location 
information  for  landmarks  of  interest  in  both  the  navigation  reference  frame  and  rel¬ 
ative  to  each  member  of  the  group.  In  addition,  because  the  navigation  solution  for 
each  sensor  is  processed  in  parallel,  scaling  effects  could  be  minimized.  Finally,  if 
ship-to-ship  ranging  information  were  available  (e.g.,  based  on  measurements  from 
an  existing  datalink),  the  performance  of  such  a  system  could  be  extremely  accurate 
and  resistant  to  long-term  navigation  error  growth,  especially  if  the  formation  geom- 


180 


LANDMARK 


Figure  7.2:  Multi-vehicle  cooperative  navigation  scenario.  Vehicles  equipped  with 

integrated  image  and  inertial  navigation  systems  could  improve  navigation  and  land¬ 
mark  location  accuracy  over  non-cooperative  systems,  especially  when  the  formation 
could  be  controlled  to  provided  maximum  observability. 

etry  could  be  controlled  to  maximize  observability.  A  sample  multi- ship,  cooperative 
navigation  scenario  is  shown  in  Figure  7.2. 

As  outlined  in  Chapter  IV,  exploiting  previously  recorded,  georeferenced  images 
for  navigation  can  provide  absolute  navigation  information,  which  is  critical  to  counter 
the  unbounded  long-term  drift  associated  with  landmark  of  opportunity  navigation. 
Based  on  simulation  analysis,  this  technique  has  the  accuracy  potential  to  navigate 
for  autonomous  landing  or  for  other  situations  where  GPS  may  not  be  available.  The 
most  important  aspect  of  all  of  these  techniques  is  the  inherent  flexibility  to  use  any 
available  sources  to  provide  the  best  navigation  estimate  possible.  For  example,  an 
image-aided  navigation  system  could  leverage  landmarks  of  opportunity  when  over 
unknown  terrain,  incorporate  measurements  from  known  landmarks  when  available, 
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and  recall  previously  estimated  landmarks  when  returning  to  base  or  loitering  in  a 
target  area. 

7. 3  Summary 

As  evidenced  by  nature,  there  is  a  natural  synergy  between  imaging  and  inertial 
sensors  for  navigation.  The  key  to  this  synergy  lies  in  coupling  both  sensors  at  the 
deepest  level  possible  to  generate  robust  navigation  estimates  under  as  many  imaging 
conditions  as  possible.  This  research  presents  a  step  in  that  direction,  however,  as 
discussed  in  the  previous  section,  there  is  much  work  to  do.  We  have  only  begun  to 
realize  the  potential  of  this  technology. 
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